Skip to main content

Full text of "Programs Grammars Arguments"

See other formats


1 

PROGRAM S 

GRAMMARS 

ARGUMENTS 



Joachim Lambek 



2 



Contents 



1 Calculations on an abacus 7 

1.1 What is a calculation? 7 

1.2 Calculable functions 11 

1.3 Primitive recursive functions are calculable 15 

1.4 Some primitive recursive functions 20 

1.5 The minimization scheme 22 

1.6 Recursive functions and relations 24 

1.7 How to calculate the nth prime 27 

2 More about calculability 31 

2.1 What is a program? 31 

2.2 Calculable functions are recursive 33 

2.3 Digression: enumerations 35 

2.4 Example of a noncalculable function 38 

2.5 Turing programs 40 

2.6 Effective procedures 43 

3 Syntactic types and semigroups 47 

3.1 A simple grammar using types 47 

3.2 Analysis of the English verb phrase 50 

3.3 Syntactic calculus 53 

3.4 Multiplicative systems 56 

3.5 Semigroups and monoids 61 

4 Context-free grammars 65 

4.1 Grammars for some fragments of English 65 

4.2 Production grammars and deductive systems 68 

4.3 Grammars of some formal languages 72 

4.4 New languages from old 77 

4.5 How are sentences produced and recognized? 79 



3 



4 CONTENTS 

4.6 Machines for producing and recognizing sentences 84 

4.7 Derivations and parsing trees 89 

4.8 A necessary condition for a language to be context-free 93 

5 Generative and transformational grammars 99 

5.1 Inflection rules conjugation 99 

5.2 More about tense and verb phrase 103 

5.3 More about the noun phrase 107 

5.4 passive construction, transformational grammar 110 

5.5 Some other transformations 116 

References 121 

Appendix: A mathematician looks at the French verb 123 

6 Introduction 123 

7 The present tense 124 

8 The other tenses 125 

9 Factorizing the conjugation matrix 127 

10 Radicals of ordinary verbs 128 

11 Radicals of exceptional verbs 129 

12 A production grammar for a fragment of French 130 



Preface 



These notes began as notes for a course called " Computability and Mathematical Lin- 
guistics" taught at McGill University for about 25 years, beginning in 1974. It was quite 
successful, but after Professor Lambek and I retired, there was no one who was sufficiently 
interested in teaching the course as designed and it eventually disappeared. There was 
a proposal to add quite a bit of logic to the notes and publish it jointly with Phil Scott 
and me, but this project never much went beyond putting the original notes (augmented 
by the paper on the French verb) into TgX. Now, nearly fifteen years later, Professor 
Lambek and I decided to put this online in case there is any interest. There are two 
additional comments by me on pages 82 and 115. 

Michael Barr, Montreal, 2007-08-16. 



5 



CONTENTS 



Chapter 1 

Calculations on an abacus 



1.1 What is a calculation? 

Some people say that a calculation is what can be done on an electronic computer, while 
others would be satisfied with pencil and paper. To give a precise answer to this question, 
Turing in 1936 invented an abstract device, later called a "Turing machine". His idea 
proved to be very fruitful, both in answering some theoretical questions on so-called 
"word problems" and in stimulating the development of the modern electronic computer. 
However, there is an easier concept that goes back to antiquity and will do the same thing. 

The word "calculate" is derived from the Latin word for pebble. At one time these 
pebbles were placed into grooves on a table, called "abacus" from a Greek word for table. 
Let us here define an abacus as consisting of 

(a) a potentially infinite number of locations A, B, C . . ., that may be distinguished 
from one another, and 

(b) a potentially infinite number of pebbles that need not be distinguishable. 
Let it also be understood that 

(c) each location can hold as many pebbles as one wishes. 

It will turn out that for a given calculation only a finite number of locations and 
pebbles are needed, but we do not wish to impose a bound on this number beforehand. 
In the same way, when talking about calculations with paper and pencil, one does not 
want to limit the amount of paper and the number of pencils to be used. 

Essentially there are only two operations one can perform on an abacus. One can put 
a pebble into a designated location, or one can take a pebble from a designated location. 
The first operation can always be performed, the second can only be performed if the 



7 



8 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



location is not empty. Accordingly there are two elementary instructions, as indicated 
by the following diagrams: 

v 

X+ Y~ 



I 




The first instruction reads: go to location X, deposit one pebble, then go to wherever 
the outgoing arrow points. The second instruction reads: go to location Y, remove one 
pebble if this can be done; in that case go to wherever the left arrow points; in the contrary 
case (that is, when Y found to be empty), go to wherever the right arrow points. It may 
be convenient to consider two further elementary instructions: 

start 

stop 

The first of these says: at the beginning go to wherever the arrow points. The second one 
says: stop. 

By a program we shall mean a finite number of such elementary instructions fitted 
together in a specified manner. We shall not make this precise at the moment, instead 
we shall illustrate some simple programs by their flow diagrams. 

1.1.1 Example Empty location X. 



start 




X_ 

x 





What does this picture mean? Instruction: remove one pebble from location X. If this 
can be done, return to the same instruction; otherwise, stop. Suppose, for example, there 
are 3 pebbles at X to start with, then successive contents of location X are indicated as 
follows: 

X 

2 
1 


In general, if the initial content of X is x pebbles, then its final content is pebbles. 



1.1. WHAT IS A CALCULATION? 



9 



1.1.2 Example Transfer content of X to Y. 



X Y 

x y 
x + y 

What does this flow diagram mean? 
Instruction (1): remove one pebble from location X. If this can be done, go to instruction 
(2). If it cannot be done, stop. 

Instruction (2): add one pebble to location Y. Then go to instruction (1). For example, 
suppose there are 2 pebbles at location X and 3 pebbles at location Y to start with. 

X Y 
~~2 3~ 
1 4 
5 

hen successive contents of those two locations are as follows: 

We shall use the word configuration to denote a complete listing of the contents of 
all locations, but we only mention those locations which are nonempty at some stage of 
a calculation. In the present example, an initial configuration of x pebbles at X and y 
pebbles at Y is transformed by the program to a terminal configuration of pebbles at 
X and y + x pebbles at Y. 

1.1.3 Example copy content of X into Y. 



start 




In discussing the performance of this program, we assume that T is empty at the 
beginning of the calculation. It follows that T is also empty at the end. We call T a tem- 
porary storage location. It serves the same purpose as scratch paper when calculating 
with paper and pencil. 



start 




10 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



In the above flow diagram, we have used T ->■ X , which means "transfer content of 
T to X" as a so-called subroutine. In accordance with Example 1.1.2, the rectangle is 
to be replaced by 




Note that the start and stop instructions within the subroutine should not be included, 
although the arrows going to and from them should be as the entry and exit points of the 
subroutines. We will always put a subroutine in a box. 

Assuming the same initial configuration as in Example 1.1.2, with T empty, here are 
successive configurations for Example 1.1.3. 

X Y T 



2 3 

1 4 1 
5 2 

2 5 



The astute reader will have noticed that the programs of Examples 1.1.2 and 1.1.3 
could have served stone-age man in the task of adding two numbers. 



Exercises 

1. What do the following programs do: 
start start 



1 ' 



x- 



stop 



start 



x- 



Y + 



stop X + 



stop 



Y- 



stop 



(In the second and third diagrams the stopinstruction has been written more than once 
to simplify the drawing of the diagrams. Nonetheless, there is only one stop. This is 



1.2. CALCULABLE FUNCTIONS 



11 



necessary in order that when a program is used as a subroutine, it will have just one 
exit.) 

2. Draw the flow diagram for a program which will result in interchanging the contents 
of locations X and Y. 



1.2 Calculable functions 

It is convenient to include among the natural numbers. The set of natural numbers 
is thus N = {0,1,2,...}. By a numerical function of one variable we mean a mapping 

/ : N ^N, and we write y = f(x) to indicate that the value of / at x is y. By a 

numerical function of two variables we mean a mapping / : N x N > N, and we write 

z = f(x,y) to indicate that / assigns to the pair (x,y) the value z. We are interested in 
calculating numerical functions of any number of variables on an abacus. 

We shall say that a program calculates the function y = f(x) with the help of 

X Y T 

the temporary storage location T if it transforms the configuration — — to 

X Y T 

It is assumed that Y and T are empty at the beginning of the calcula- 



x f(x) 

tion. T may be occupied at some stages during the calculation, but must be empty at 
the end. No other locations are supposed to be involved in the calculation, as we did not 
specify any. The fact that X has the same content at the end of the calculation as at the 
beginning is required for technical reasons: we don't want the input to be lost. Otherwise 

X Y T 

we could have permitted the final configuration — — — . 

f(x) 



We will denote by Y := f(x) a machine that calculates assigns to the location Y the 
value of the function / evaluated at the contents x of the location X and leaves all other 
locations unchanged, that is the machine that calculates y = f(x). Similarly a machine 



that calculates z = f(x,y) will be denoted Z := f(y,z) 



For example, the identity function y = x, the successor function y = S(x) — x + 1, 



12 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



and the constant function y = 3 are calculated by the following three programs: 



start 



copy X into Y 



stop 



start 

>' 

Y := x 

>- 

Y+ 

>- 

stop 



start 

Y 

Y+ 

Y 

Y+ 

Y 

r+ 

Y 

stop 



This is shown by the initial and final configurations: 
X Y T X Y T 



X Y T 



x 
x 





x 



x 

x S(x) 



X 




3 



Note that the subroutine copy X into Y is exactly the same as Y := x . 



Hard working students are invited to draw the flow diagram for the constant function 

y = 17. 

Aside from such trivial functions, what other functions can be calculated on an abacus? 
Consider the following numerical functions: 

y = the (x + l)st prime number, 

y = the number of primes less than x, 

y = the (x + l)st digit of ir. 

All these can be calculated on the abacus, and for two of them we shall construct 
programs later. In fact, at the moment, the reader will probably not be able to think of 
a numerical function that cannot be calculated on an abacus. 

More generally, we say that a program calculates a numerical function z — g(x±, x 2 , ■ ■ ■ ■ 
of n variables with the help of temporary storage locations T 1; T 2 , ... , T fc , if its initial 
and final configurations are as follows: 



Xi X 2 



x„ 



T 



Xi 
Xi 



X2 
X 2 







x n U 

g(x u ...,x n ) 








1.2. CALCULABLE FUNCTIONS 



13 



For example, here is a simple minded program for calculating z = x + y: 

start 

I 

copy X into Z 
I 

copy Y into Z 

I 

stop 

We shall presently give another program for calculating addition which is more sophisti- 
cated, but which has the advantage of serving as a model for programs that will calculate 
multiplication and exponentiation. 

What is meant by the sum x + y of two numbers? We do know that x + = x. 
Moreover, if we happen to know the value of x + 173, then we also know that x + 174 is 
one more. 

We shall write u S(y)" for the successor of y. (We are not allowed to call this y + 1 
until after addition has been defined.) We then have quite generally 

x + S(y) = S(x + y) 

The sum of two numbers is now completely determined, once we decide to define the 
natural numbers by 

1 = 5(0), 2 = 5(1), 3 = 5(2),... etc. 
For example, we have 

3 + 2 = 3 + 5(1) = 5(3 + 1) 

where 

3 + 1 = 3 + 5(0) = 5(3 + 0) = 5(3) = 4 

hence 

3 + 2 = 5(4) = 5 

The definition of addition discussed here is called a recursive definition. 
We shall now present programs for the functions z — x + y, z = x *y, z = x y . Note 
that these functions are defined recursively as follows: 

x + = x ( x *0 = { x° = 1 

x + S(y) = S(x + y) 1 x * S(y) = x * y + x 1 x s ^ = x y * x 



14 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



where 1 = S(0). 

Here is the program to calculate z = x + y: 

start 




transfer Z to U\ stop 
calculate z — S(u) 




X 

x 
x 

X 
X 



Y 

y 
o 



1 



z 

x 

x+1 



T 

y-i 
y-i 
y-2 



IT 
o 

x 




y x+y 



Some of the configurations which appear during the calculation are shown in the above 
table. In addition to the temporary storage locations T and U, there are also other 
temporary storage locations which are used in the subroutines. 
To calculate z 



x * y and z = x y one need only replace the subroutine Z: 



Z:=x 


in the 


by 


Z:= u+x 



and Z:= u * x , respectively. 



Exercises 

1. Draw the flow diagram of a program for the function z = x 2 . (You may regard this 
as a special case of z = x * y or of z = x y '.) 



1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE 



15 



2. Which function y — f(x) is calculated by the following program: 

start 



calculate t — x 



calculate u = x + 1 



\ 



calculate y = t ■ u 



empty T 



empty U 



stop 

3. Draw the flow diagram of a program to calculate y = x 2 + 2x + 2. 

4. The function z = x j y is defined thus: x f = l,x t = x^ 1 "^. For instance, 
£ t 3 = x^ 1 ^ 1 . Write out a program. 



1.3 Primitive recursive functions are calculable 

The functions x + y, x * y, and x ] y are all examples of "primitive recursive" functions. 
Probably all the numerical functions you can think of are primitive recursive. By a 
numerical function of k variables we understand a function y = f(xi,x 2 , ■ ■ ■ ,Xk), where 
Xi,x 2 , ■ ■ ■ ,Xk and y all range over the set of natural numbers. When k — 0, this reduces 
to y — c, where c is a natural number. 

We shall now list a number of numerical functions and some schemes for producing 
new functions from known functions. 

R-l. successor function: f(x) = S(x) 

R-2. constant functions: f(x 1: ... : x n ) = k 

R-3. projection functions: f(x±, . . . , x n ) = Xk 

R 4. substitution scheme: 



f(x u ...,x n )= g(hl(x 1 , ...,x n ),..., h k (x u x n )) 



16 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



R-5. recursion scheme: 

f(xi, x n -i, 0) = g(x ± , x n -i) 

f(xi, x n -!,S(y)) = h(x 1: x n -i, y, f(xi, x n -i,y)) 

Remarks. R-2 and R-3 are families of functions, one for each pair of natural numbers 
n and k. R-3 includes the identity function f(x) = x as a special case, when n — k — 1. 
R-4 produces one function f of n variables, given one function g of k variables and k 
functions hi, . . ., hk of n variables. R-5 recursively defines a function / of n variables, 
given a function g of n — 1 variables and a function h of n + 1 variables. Special cases of 
R-5 are the following: 
R-5, case n — 1, 

7(0) = c 

f(S(y)) = h(y,f(y)) 
R-5, case n = 2, 

/(*, 0) = 

f(x,S(y)) = h(x,y,f(y)) 

Informally we may define primitive recursive functions as all functions of type R-l, 
R-2 and R-3, as well as all functions derivable from these by schemes R-4 and R-5. 
More formally, a mathematician would put this as follows: 

1.3.1 Definition The class of primitive recursive functions is the smallest class 
of numerical functions containing all functions of types R-l, R-2 and R-3 and closed 
under schemes R-4 and R-5. 

To say that a class of numerical functions is closed under scheme R-4 means 
that if hi(xi, . . . ,x n ), . . . ,hk(xi, . . . ,x n ) and g(yi, . . . ,yu) are in the class, then so is 
f(xi, . . . ,x n ) as defined by R-4- 

To say that a class of numerical functions is closed under scheme R-5 means that if 
g(xi, . . . ,x n -i) and h(xi, . . . ,x n -i,y, z) are in the class, then so is f(xi, . . . x n -i, y) as 
defined by R-5. 

for example, why is x + y a primitive recursive function? We recall that the recursive 
definition of x + y reads 

J x + = x 

\x + S(y) = S(x + y) 

It will be convenient to denote the primitive recursive function defined by R-3 by p%, thus 

pl(xi,...x n ) = x k 

We note that x = p\(x) and that x + y = p\{x, y, x + y), hence the recursive definition of 
x + y may be written thus: 

x + = p\{x) 

x + S(y) = S(pl(x,y,x + y)) 



1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE 



17 



Then f(x,y) = x + y is primitive recursive by R-5, case n — 2, if we take g{x) = p\{x) 
and h(x,y,z) = S(pl(x,y, z)). Here g is primitive recursive by R-3, and h is primitive 
recursive by R-5, with n = 3 and k — 1, since 5 is primitive recursive by R-l and P33 is 
primitive recursive by R-3. 

Similarly one shows that x * y is primitive recursive. Now consider x\, this satisfies 
the recursive definition 

0! = 1 

S(x)\ = S(x)*x\ 

Then f(x) = x\ is primitive recursive by R-5, case n — 1, if we take c = 1, and 
h(y, z) = S(y) * p\(z) = S{{p\{y, z)) * p\{y, z). Here h is primitive recursive by R-4, with 
n — 1, and k = 2, since multiplication is primitive recursive, by the above observation, 
and S and pi are primitive recursive by R-l and R-3 respectively. 

1.3.2 THEOREM Every primitive recursive function is calculable. 



Proof We must show that the class of calculable functions contains all numerical 
functions of types R-l, R-2, and R-3, and that it is closed under schemes R-4 and R-5. 

For concreteness, we take n = 3 and k = 2 in R-2, R-3, and R-4, and n=2 in R-5. 
The proof then consists in inspecting the following five programs. 



R-l. Program for y = S(x): 



start 



copy X into Y 



Y + 

v 

stop 



R-2. Program for w = f(x,y, z) = 2: 



start 



W + 

1 

w + 



V 

stop 



18 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



R-3. Program for w = f(x,y, z) = y: 

start 

L 

copy Y int 

I 

stop 

R-4. Program for w = g(hi(x, y, z), hz(x, y, z))\ 

start 



1 

= hi(x,y, 



calculate u = 


hi(x,y,z 


) 


1 


calculate v = 


h 2 (x,y,z 


) 




1 








calculate w 


= g{u,v) 





empty U 

I 

empty V 

I 

stop 

R-5. Program to calculate z = f(x,y), when 

f(x,0) = g(x) 
f(x,S(y)) = h(x,y,f(x,y)) 



1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE 



19 



start 




calculate z = h(x, y, u) 




For example, take f(x,y) = x + y. Then we have seen that g(x) = x and h(x,y,u) = 
S(u). Hence the above program specializes to that given for x + y in Section 1.2. The 
same goes for x * y and xy. 



20 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



Exercise 

1. What function in one or two variables is calculated by the following program? Explain 
why this is so. 

start 




calculate z = u + 2y + 1 




1.4 Some primitive recursive functions 

The following list of primitive recursive functions is taken from the book by Kleene. We 
have already met (1) to (4). (5) deals with the "predecessor function", the predecessor of 
being defined exceptionally as zero. (6) deals with the "naive difference" between x and 
y, which is considered to be when y > x. (7) and (8) deal with the usual maximum and 
minimum of a pair of numbers. (9) is called the "delta function", (10) the "sign function", 
and (11) the "absolute difference". 

f x + = x 

\x + S(y) = S(x + y) 
( x*0 = 

\ x * S(y) = (x * y) + x 

(x° = 1 
I x s ^ = x y * X 
f 0! = 1 

\S(x)\ =x\*S(x) 



(1) x + y 

(2) x * y 

(3) xy 

(4) x\ 



1.4. SOME PRIMITIVE RECURSIVE FUNCTIONS 



21 



(6) x - y 



(5) pre(a;) 



x — 1 if x > 
if x = 



rr — y if rr > y 
if x < y 



pre(O) = 
pre(S'(x)) = x 
x — = x 

x - S(y) = pre(x - y 



(7) m±n(x,y) = x - (x - y) 

(7') min(x, y, z) — min(min(:c, y), z) 

(8) max(x,y) = (x + y) - min(x, y) 
(8') max(a;, y, z) = max(max(x, y), z) 



x + (y - x) 



(9) S(x) = {J 




{ 



5(0) = 
5(S(x)) = 



Alternative definitions: S(x) — 1 — x — 







(10) sgn(x) = { J 




f sgn(0) = 
I sgn(5(x)) = 1 



Alternative definitions: sgn(x) = = min(a;, 1) 

(11) \x-y\ = (x - y) + (y - x) 

The first three functions above suggest a whole sequence of functions f t (x,y), t = 0, 
1, 2, 3, . . ., where f (x,y) = x + y, f x (x,y) = x * y, f 2 (x,y) = x y , and f 3 (x,y) = x t y, 
which made an appearance in one of the exercises. We are led to define fs(t){x,y) thus: 



It is easily seen, by mathematical induction on t, that each of the functions f t (x,y) of 
two variables is primitive recursive. 

Let us now consider a function of three variables defined as follows: 



It is clear that this function is defined recursively, in some sense of the word "recursive" . 
In fact it can be shown that it is calculable on our abacus. Yet, it is not primitive recursive, 
as was shown by Ackermann. Putting g{x) = f(x, x, x), he proved that, for every primitive 
recursive function h(x), there exists a natural number x such that g(x) > h(x). Thus g(x) 
cannot be primitive recursive, hence f(x; y, z) cannot be primitive recursive. (Actually, 
Ackermann considered not this function, but a very similar one.) 



fs(t)(x,0) = sgn(t) 
fs(t)(x,S(y)) = f t (xJ S (t)(x,y)) 



f(t,x,y) = ft(x,y) 



In other words, 



' f(0,x,y) =x + y 
< f(S(t),x,0) = S gn(t) 
J(S(t),x,S(y)) = f(t,x,f(S(t),x,y)) 



22 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



Exercise 

1. Show how to calculate functions (5) to (11) above, by drawing flow diagrams of 
programs of the abacus. Make use of subroutines freely. 



1.5 The minimization scheme 



The reader will recall the substitution scheme R-4 and the recursive scheme R-5, which 
played a role in the definition of the class of primitive recursive functions. Another useful 
scheme is the following minimization scheme: 

R-6. f(xi, . . . , x n ) = smallest y such that g(x±, . . . , x n , y) = 0. 

Given a numerical function g of n + 1 variables, this defines a new numerical function 
/ of n variables provided the following condition is satisfied: 



(*) 



Vxi . . . Vx n 3y, g(x u ...,x n ,y) = 



(Here "V" means "for all" and "3" means "there exists".) 

If condition (*) is not satisfied then / is not a numerical function, but only a partial 
numerical function; its domain is a proper subset of N x . . . x N, the set of all n-tuples 
of natural numbers. 

1.5.1 Proposition In the minimization scheme, if g is calculable so is f. 

Proof To make things concrete, let us consider the special case n — 1. Then R-6 
becomes: 

f(x) = smallest y such that g(x,y) = 

and the condition reads 

(*) Vrr3y, g(x,y) = 

Now consider the following program: 

start 



X Y 



calculate z = g(x,y) 




stop 



x 

x g(x,0)^0 

x 1 g(x,l)^0 

x y g(x,y) = 



1.5. THE MINIMIZATION SCHEME 



23 



We note that when g(x,y) = for the first time, then y = f(x); hence the program 
does calculate the function / at x. Suppose however that there does not exist a natural 
number y such that g(x,y) = 0, then the above calculation does not terminate, and / is 
not defined at x. We observe that Z is a temporary storage location. 

1.5.2 Example Take g(x, y) = x — y. Then f(x) = x. Thus / is the identity function, 
a special case of R-3. This example illustrates that a function defined by means of scheme 
R-6 may very well be definable without using scheme R-6. 

1.5.3 Example We recall that, for any real number £, [£] denotes the greatest integer 
not exceeding £. Thus [£] = n if and only ifn<£<n + l. [£] may also be regarded as 
the smallest integer n such that n + 1 > £. 

Let us calculate the function y — [x/2] with the help of the minimization scheme. 
Indeed, 

[x/2] = greatest y such that 2y < x 

= smallest y such that 2(y + 1) > x 

= smallest y such that 2(y + 1) > x + 1 

= smallest y such that (x + 1) — 2(y + 1) = 0. 

Clearly, condition (*) is satisfied in this case. A program for calculating y — [x/2] is thus 
given by the following flow diagram, where Z is a temporary storage location: 



start 

I 




1.5.4 Example More generally we may try to calculate 

[x/y] = greatest z such that * y < x 

= smallest ( such that z + 1) * y > x 
= smallest ( such that z + l)*y>x + l 
= smallest ( such that x + l)(z+ l)y = 



24 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



The program will be the following, using W as a temporary storage location: 



start 

I 




Of course, [x/0] is undefined. In fact, it is easily seen that a calculation with this program 
does not terminate when y — 0. 

Exercises 

1. Using the minimization scheme, show how to calculate y = [y/x] and y = [logio(x + 1)]. 
Draw the flow diagrams. 

2. What function is calculated by the following program? Explain how you got your 
answer. 



start 




1.6 Recursive functions and relations 
We shall make a definition. 



1.6.1 Definition The class of recursive functions is the smallest class of numerical 
functions which contains all functions of types R-l, R-2, and R-3, and which is closed 
under schemes R~4, R-5, and R-6, where R-6 satisfies condition (*). 



1.6. RECURSIVE FUNCTIONS AND RELATIONS 



25 



1.6.2 Theorem Every recursive function is calculable. 

Proof This is an immediate consequence of Theorem 1.3.2, which asserts that every 
primitive recursive function is calculable, and of Proposition 1.5.1 which asserts that the 
class of calculable functions is closed under the minimization scheme. ■ 

1.6.3 Definition The class of partial recursive functions is the smallest class of 
partial numerical functions which contains all functions of types R-l, R-2 and R-3, and 
which is closed under schemes R~4, R-5, and R-6, where R-6 need not satisfy condition 
(*). 

For example, z — [x/y] is only a partial recursive function, as it is undefined when 
y — 0. On the other hand, z — [x/(y+ 1)] is always defined hence it is a recursive function. 
We can artificially introduce a function. 

, = f when y = 
I [x/y] when y > 

Then 

[x/y]' = smallest z such that y = or (z + 1) * y > x 
It will become clear later that this equation can be written in the form 

[x/y]' = smallest z such that g(x,y,z) = 

where g is a suitable recursive function. Thus z = [x/y]' will be a recursive function. 
(Actually, it is even a primitive recursive function, as is seen by looking at the book by 
Kleene.) 

We write u R(x,y)" to say that x bears the relation R to y. u R(x,y)" is a statement 
capable of being true or false, and we might think of R as a function from N x N to {T, F}, 
the set of truth values. We regard R as a binary relation, and this is a special case of an 
n-ary relation, when n — 2. Unary relations, when n = 1, are also called properties. 

If R and S are relations, binary for the sake of definiteness, we define new relations 
R A S, i? V S and ->R by saying that R A S (read R and S) holds at (x, y) if and only if 
both R(x,y) and S(x,y), it! V S (read R or S) holds at (x,y) if and only if at least one 
of R(x, y) or S(x, y) does and ->R (read not R) holds at (x,y) if and only if R(x, y) fails. 
We also say that R A S is the conjunction of R and S, that R V S is the disjunction 
of R and S and that ->R is the negation of R. 

We shall now define what we mean by a relation or property being recursive. For 
concreteness we restrict attention to the cases n = 2 and n — 1. 



26 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



1.6.4 Definition A binary relation R (property P) is said to be recursive if there 
is a recursive function g such that R(x,y) if and only if g(x,y) = (P{x) if and only if 
g(x) = 0) for all natural numbers x and y. 

For example, > is a recursive relation, since x > y if and only if y — x = 0, and equality 
is a recursive relation, since x — y if and only if \x — y\ — 0. The property of being even 
is recursive, since x is even if and only if [x/2] * 2 = x, that is, \[x/2] * 2 — x\ =0. 

The usefulness of this definition arises from the fact that one can apply the minimiza- 
tion scheme to a recursive relation to produce a recursive function: 

f(x) = smallest y such that R(x,y) 

= smallest y such that g(x,y) = 

The condition (*) of Section 1.5 then becomes: 

Vx3yR(x, y) 

1.6.5 PROPOSITION The class of recursive relations (properties) is closed under con- 
junction, disjunction, and negation. 

Proof Suppose 

R(x,y) if and only if g(x,y) = 

and 

S(x, y) if and only if h(x, y) — 
where g and h are recursive functions. Then 

R(x,y) A S(x,y) if and only if g(x,y) + h(x,y) = 

R(x, y) V S(x, y) if and only if g(x, y) * h(x, y) — 

and 

-*R(x, y) if and only if 1 — g(x, y) — 

Since +, * and — are all recursive functions, it follows that g(x,y) + h(x,y), g(x,y) * 
h(x,y) and 1 — g(x,y) are all recursive. ■ 

It is now clear that the function z = [x/y]' defined earlier is recursive. In fact 

[x/y]' = smallest z such that y = 0or(z + l)*y>x + l 

= smallest z such that y = or (x + 1) — (z + 1) * y = 
= smallest z such that y * ((x + 1) — (z + 1) * y) — 0. 



1.7. HOW TO CALCULATE THE NTH PRIME 



27 



Exercise 

Show that the following properties and relations are recursive, in each case producing a 
function g as in Definition 3. 

1. x is odd. 

2. a; is a perfect square, that is, x = [y/x] 2 . 

3. x < y. 

4. x^y. 

5. x < y < z (ternary relation). 

1.7 How to calculate the nth prime 

One saxs that y divides x and writes y\x when 3z, (y * z = x). When y ^ 0, this is 
equivalent to saying that y * [x/y] = x. On the other hand, 0\x only when x = 0. Thus, 
in any case, y\x if and only if y * [x/y] 1 = x if and only if x — (y * [x/y]') = 0. Therefore 
the relation "divides" is recursive. 

Let us now look at the property of being prime. A natural number x is called prime, 
and we write prim(x), provided x > 1 and x has no divisor y > 1 other than x. In other 
words, prim(x) if and only if 

x > 1 and x = smallest y such that y > 1 and y\x 

Unfortunately, when x < 1, there is no y such that y > 1 and y\x, hence the minimization 
scheme which appears in the above definition of prim(x) defines only a partial function. 
To make this into a function we write 

prim(x) if and only if x > 1 A x = (f)(x) 

where 

4>(x) = smallest y such that x < 1 V (?/ > 1 A y\x) 
= smallest y such that g(x,y) = 

where 

g(x, y) = (x - 1) * ((2 - y) + (x - y * [x/y]')) 
Once this has been checked, it follows that 

prim(x) if and only if (2 — x = 0) A (x — (f>(x)) = 
if and only if ip(x) = 

where 

tp(x) = (2 - x) + (x - (j)(x)) 



28 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



The two functions <ft(x) and ip(x) are of no importance except that they serve to show 
that prim is a recursive property. 

The reader will be familiar with the sequence of prime numbers 



We wish to put 2 = p(0), 3 = p(l), etc. We thus write p(x) for the (x + l)st prime. The 
function y = p(x) may be defined recursively thus: 



We must check that Vu 3y, x(u, y) = 0. This amounts to showing that for each natural 
number x there exists a prime number y which is greater than p(x). This fact was shown 
by Euclid, who observed that 



is not divisible by the first x + 1 prime numbers p(0), p{l), ■ ■ ., p{x) (since it leaves a 
remainder of 1 when divided by any of them). Knowing that it must be divisible by some 
prime number y (every integer > 1 is) he deduced that y > p(x). 

Therefore y = p(x) is a recursive function. Since all recursive functions are calculable, 
it follows that there is a program to calculate y = p(x) on the abacus. If the reader wishes 
to take the trouble, he could draw a flow diagram for this program, making use of known 
subroutines. 

We can also calculate the function 



2,3,5,7,11,... 




q(x,u) = smallest y such that x(. u iD) — 



where 



X(u,y) = ip(y) + (u + 1) - y 



1 + p(0) * p{l) * ■ ■ ■ * p(x) 



ir(x) = number of primes < x 



We note that this has the following recursive definition: 




ir{x) + S(tp(x)) 




1 if if){x) — 0, that is, x is a prime 
otherwise 



1.7. HOW TO CALCULATE THE NTH PRIME 



29 



One may also define n(x) in terms of p(x) by noting that 

p(x) = smallest y such that x < ir(y + 1) 

For technical reasons we shall later require a certain function in two variables exp(x, y), 
but we prefer to write exp x (y). By this we mean the exponent of x in y, that is, the 
greatest z for which x z divides y. For example, exp 5 (100) = 2, since 5 2 divides 100 but 
5 3 does not. In fact, we may put 

exp x (y) = smallest z such that ~^(x z+1 \y) 

utilizing the minimization scheme, at least when x ^ 1 and y ^ 0. The only trouble is 
that Condition (*) of Section 1.5 may be violated: x z+1 may divide y for all z. This will 
indeed be the case when x = 1 or y = 0. We arbitrarily put exp x (y) = in these two 
exceptional cases. We now end up with the following definition: 

exp x (y) = smallest z such that i = 1Vj/ = 0V ~^(x z+l \y) 

Exercises 

1. Draw a flow diagram for a program which will calculate the function ip(x) of the text, 
which has the property that x is a prime if and only if ip(x) = 0. 

2. Using the above exercise as a subroutine, exhibit the flow diagram of a program which 
will calculate ir(x), the number of primes less than x. 

3. Using either of the above, draw the flow diagram of a program for calculating p(x), 
the (x + l)st prime. 



CHAPTER 1. CALCULATIONS ON AN ABACUS 



Chapter 2 

More about calculability 



2.1 What is a program? 

In Section 1.1 we described programs for an abacus informally with the help of flow 
diagrams. We now wish to give a precise definition. Let us begin by looking at an 
example. We recall the program "transfer content of X to Y" : 



start 




Let us assign numerical labels to the nodes of this flow diagram (which correspond 
to the instructions of the program) other than those already labelled "start" and "stop" . 

start 




X-(node 1) 



F + (node 2) stop 
The program can now be represented by the following table: 





start 


1 


2 


yes 


1 


2 


1 


no 


1 


stop 


1 


location 




X 


Y 


action 




minus 


plus 



31 



32 



CHAPTER 2. MORE ABOUT CALCULABILITY 



This table tells us the following: At node 1, carry out the operation X~. Can it 
be done? Yes, then go to node 2; no, then go to node stop. At node 2, carry out the 
operation Y + . Can it be done? Yes, then go to node 1; no, then also go to node I. (Of 
course the negative alternative cannot really occur.) At node start do nothing to any 
location and go to node 1 in any case. 

Suppose, for example, we are given the following table: 





start 


1 


2 


3 


yes 


2 


stop 


3 


2 


no 


2 


stop 


3 


1 


location 




A 


B 


C 


action 




minus 


plus 


minus 



It is clear that this can only correspond to the following flow diagram, probably repre- 
senting a pretty stupid program: 

start 




stop 



Let us write "a" for "yes", u (3" for "no", "7" for "location" and "e" for "action". 
Then a, (3, 7 and e are clearly functions. To describe the source and target of each of 
these functions, we let L be the set of locations, and we assume that there is given a finite 
set N of nodes, 

N = {start, 1, 2, . . . , k, stop} 



Then 



a 
P 
7 
e 



N - {stop} - {start} 

N - {stop} - {start} 

N — {stop, start} > L 

N — {stop, start} > {plus, minus}. 



(Here N — {stop} is short for {start, 1,2,..., k}, etc.) 

A program may now be defined as a quadruple (a, (3, 7, e) subject to the restriction 
that a(n) = f3(n) if n = start or if e(n) = plus. 



2.2. CALCULABLE FUNCTIONS ARE RECURSIVE 



33 



Exercises 

1. Make up a table for the program "copy A into B" . 

2. Draw the flow diagram corresponding to the following table: 





start 


1 


2 


3 


4 


yes 


1 


2 


stop 


4 


3 


no 


1 


2 


3 


4 


stop 


location 




A 


B 


C 


B 


action 




plus 


minus 


plus 


minus 



2.2 Calculable functions are recursive 

Let there be given a program (a, (3, 7, e) for the abacus. We have talked about the finite 
set of nodes 

N = {start, 1,2, ... ,k, stop} 
and the infinite set of locations 

L = {A,B,C,...} 

For our present purpose it will be more convenient if both nodes and locations are denoted 
by numbers, thus we let 

N = {0,1,2,. ..,k,k + 1} 
£ = {1,2,3,...} 

A stage of a calculation is described by the node n and the configuration (rii,n 2 , . . .) 
where rii is the number of pebbles at the location i. It is understood that all but a finite 
number of locations are empty, so that all but a finite number of the rii are zero. With 
each stage we associate its so-called Godel number 

s = 2 n 3 ni 5 n2 ••• 

where all but a finite number of factors are equal to 1. 

Given any positive integer s, by factoring it into prime powers, we can read off the 
stage whose Godel number it is (assuming the exponent of 2 does not exceed k + 1). Note 
that the factorization of a positive integer into a prime powers is unique. 

What does our program (a, (3, 7, e) do to a stage of a calculation? It determines the 
next stage; in particular, 

C-l. If n — 0, it changes node n to node a(n). 



34 



CHAPTER 2. MORE ABOUT CALCULABILITY 



C-2. If < n < k and e(n) = plus, it adds one pebble to location 7(71) and changes node 
n to node a(n). 

C-3. If < ra < fc, e(n) = minus and 7(71) is nonempty, it takes one pebble from location 
7(71) and changes node n to node a(n). 

C-4. If < n < k, e(n) = minus and g(n) is empty, it leaves the configuration unchanged, 
but changes node n to node f3(n). 

C-5. If n = /c + 1, there is no next stage. 

Assuming that n < k, so that there is a next stage, let its Godel number be G(s). 
Then putting n = exp2(s), we obtain 

r s * 2 a (°) if n = 

ni \ — J s * 2 a (™^ _n * ^(7(71)) if 7i > and e(n) = plus 

I s * 2 a( ") _n /p(^(n)) if 7i > 0, e(n) = minus and p(7(n))|s 

[ s * 2 /3 ( n ) _n if 7i > 0, e(n) = minus and -1(^(7(71)) |s) 

So far, G(s) is undefined when n > k + 1 and also when s = 0. Let us arbitrarily put 

G(s) = 0ifs = 0orn>A; + l 
With a little bit of extra work, one can rewrite the above as follows: 



G(s) 



( 9l (s) if/i(s) = 

^ 2 (s) if/ 2 (s) = 

(? 3 (s) if/ 3 (s) = 

g 4 (s) if/ 4 (s) = 

. if s = or exp 2 (s) > A; + 1 



where ^1, ^2, 93, 9i and /1, /2, ^3, fi are recursive functions, for example, <?i(s) = s * 2 a ^°) 
and = exp 2 (s). The others are a little harder. Now for each s, exactly one the five 
cases holds, hence 

G(s) = g 1 (s)6(f 1 (s)) + ■ ■ ■ + g 4 (s)6(f 4 (s)) 

and so G(s) is seen to be a recursive function. 

Iterating the function G(s), we obtain the function G t (s) which may be regarded as a 
function of two variables and is defined recursively thus: 

/ G°(s) = s 
\G s W(s) = G(G t (s)) 



2.3. DIGRESSION: ENUMERATIONS 



35 



What do we mean by saying that the program (a, (3, 7, e) calculates the function y = 
f(x)7 Take X = 1 and Y — 2, then the initial and final configurations are given by 

1 2 

~x 
x f(x) 

and hence the Godel numbers of the initial and final stages are 

s = 2°3 X 5°, G m(x) (s) = 2 k+1 3 x 5 fix) 

where 

m(x) = smallest t such that exp 2 G t (3 x ) = k + 1 

Therefore 

f(x) = exp 5 G^ x \3 x ) 

Since f(x) is constructed from known recursive functions with the help of the minimization 
and substitution schemes, it follows that f(x) is recursive. We have thus proved the 
following: 

2.2.1 THEOREM Every calculable numerical function is recursive. 

As we have already established the converse of this in Section 1.6, we conclude that 
"calculable" and "recursive" are synonymous terms. 

Are there any numerical functions which are not calculable or recursive? We shall 
answer this question after a short digression. 

Exercises 

1. A calculation has arrived at node 5 of the flow diagram of a certain program. There 
are 3 pebbles at A, 1 pebble at C and no pebbles at any other location. What is the 
Godel number of this stage of the calculation? 

2. Assuming that a certain program contains enough instructions (nodes), describe the 
stage of a calculation corresponding to the Godel number 90,000? 

2.3 Digression: enumerations 

A set X is said to be countable (or enumerable) if it may be put into one-to-one 
correspondence with the set of natural numbers N, that is, if there is a function / : 
N > X which is one-to-one and onto. 



36 



CHAPTER 2. MORE ABOUT CALCULABILITY 



For example, the following sets are countable. 

{0, 1, 2, 3, . . .} = set of natural numbers 

{1, 2, 3, 4, . . .} = set of positive integers 

{0, 2, 4, 6, . . .} = set of even natural numbers 

{0, 1, 4, 9, . . .} = set of perfect squares 

{2, 3, 5, 7, . . .} = set of prime numbers 

The enumeration function / in these examples is given by 

f(n) = n, n + 1, 2n, n 2 , p(n) 

all of which are recursive functions. It is easily seen that every infinite subset X of N is 

countable, although the function / : N will not in general be recursive, as will be 

seen later. 

A countable set X need not be a subset of N. For example, the set of integers is 
countable: 

{0,1,-1,2,-2,...} 

A more interesting example is the set N x N of all pairs of natural numbers. This is 
enumerated thus: 

(0,0), 

(0,1), (1,0), 
(0,2), (1,1), (2,0), 
(0,3), (1,2), (2,1), (3,0), 



We first take all pairs with sum 0, then all pairs with sum 1, etc. Now the number of entries 
in the first n lines of the above array isl + 2 + 3 + -- -+ n = n(n + l)/2. It is easily checked 

that the function / :NxN > N is given by f(x,y) = ((x + y)(x + y + l)/2) + x, clearly 

a recursive function in two variables. 

As our next example, consider the set of all finite sequences of natural numbers. With 
each finite sequence (a , a 1: . . . , a n ) we associate the number 2 a ° +1 3 ai+1 • • • p(n) a " +1 . This 
gives a one-to- one correspondence between the set of finite sequences of natural numbers 
and a subset of N. 

An example of a set which is not countable is the set of all properties of natural 
numbers. Suppose this set can be enumerated, then all properties of natural numbers are 

contained in the sequence P , P±, P 2 , Let us now defined a new property P as follows: 

P(x) if and only if ->P x {x). We don't care which natural numbers have property P, but 
we do know that P is not in the above sequence. For suppose P = Pk, then 

P(k) if and only if P k (k) 
if and only if ~>P(k) 



2.3. DIGRESSION: ENUMERATIONS 



37 



clearly a contradiction. It follows that the set of all properties of natural numbers cannot 
be enumerated. 

The above argument is due to Cantor, and it is known as Cantor's diagonal argument. 

Another application of this argument is to the set of all functions N > N. Suppose this 

set can be enumerated thus: / , /i, / 2 , • • •• Let us now define a function / : N > N 

by putting f(x) = f x (x) + 1. Suppose this function was a member of the sequence, then 
f — fk, for some fceN. Hence 

/*(a;) = fx(x) + 1 

for all x e N. Taking x = k, we get the contradiction 

/*(*) = /*(*) + 1 

Therefore the set of functions N > N cannot be enumerated. 

A third application of Cantor's diagonal argument deals with the set of all real numbers 
x such that < x < 1. Suppose this set can be enumerated thus: 

ai = 0.a u a 12 a 13 ■ ■ ■ 

a 2 = 0.a 2 ia 22 a23 • • • 
a 3 = 0.a 3 ia 3 2a33 • • • 



where the a^- are digits from to 9. It will be convenient to assume that none of these 
decimal expansions involve an infinite sequence of nines, since every such expansion can 
be replaced by another one with an infinite sequence of zeros. (For example, 0.1999 • • • = 

0. 2000 • • •. Let us now construct a number 

b = O.&1&2&3 • • • 

where b\ is neither a u nor 9, 6 2 is neither 022 nor 9, and so on. Then b 7^ a x since they 
differ in the first decimal place, b 7^ a 2 since they differ in the second decimal place, b 7^ 03 
since they differ in the third decimal place and so on. But < b < 1 so we must have 
b — a n for some n, a contradiction. It follows that the set of real numbers in the unit 
interval cannot be enumerated. 

We shall meet yet another application of Cantor's diagonal argument in the next 
section. 

Exercises 

1. Explain why each of the following sets is countable: 
a. the set of odd natural numbers, 



38 



CHAPTER 2. MORE ABOUT CALCULABILITY 



b. the set of cubes of natural numbers, 

c. the set of odd integers, 

d. the set of rational numbers, 

e. the set of 3-tuples of natural numbers. 

2. Explain why the following sets are not countable: 

a. the set of real numbers, 

b. the set of complex numbers, 

c. the set of irrational numbers. 



2.4 Example of a noncalculable function 

Every calculable numerical function N >■ N can be calculated with the help of some 

program (a, (3, 7, e). The program may be described explicitly as an array of numbers; 
provided we encode plus and minus as numbers, for example, as 1 and respectively. 



1 • 


• k 


k + 1 


a{\) ■ 


■ a(k) 


a(k + 1) 


m- 


■P(k) 


P(k + 1) 


7(1) ■ 


■l(k) 


l(k + l) 


6(1) ■ 


■ <k) 


e(k + l) 



It is clear that these arrays can be enumerated, hence the set of programs is countable. 

Thus the set of calculable functions N > N is countable. On the other hand, the set 

of all numerical functions N > N is not countable, as we saw in the previous chap- 
ter. Therefore not all numerical functions are calculable. We shall actually construct a 
numerical function which is not calculable. 

Let P , Pi, P 2 , . . . be an enumeration of all programs. Suppose an initial configuration 
of x pebbles at location 1 and pebbles at location 2 is transformed by program P n to a 
final configuration of x' pebbles at location 1 and y pebbles at location 2 1 . Then we shall 
write f n (x) = y and say the program P n calculates f n at x. If the calculation does not 
terminate, we shall say that f n (x) is undefined. Thus f n (x) is a partial recursive function 
in x. (Note that in the present context we do not insist that x' = x as in an earlier 
chapter. However, it is easily seen that some slightly modified program would calculate 
f n (x) in the earlier sense.) Writing f(x,n) = f n (x), we obtain a partial recursive function 
in two variables. 

Let us say that n has property G if the above calculation terminates for each x, that 
is, if f n (x) is a function, not just a partial function. Let g(y) be a numerical function 



We do not caxe what happens at other locations. 



2.4. EXAMPLE OF A NONCALCULABLE FUNCTION 



39 



which enumerates all numbers with property G. For example, we may write 

( g(0) = smallest n such that G(n) 

\ g(S(y)) = smallest n such that n > g(y) A G(n) 

The right hand side of this last equation is well defined, since there are evidently 
infinitely many natural numbers with property G, as there are infinitely many calculable 
functions. Thus g(y) is a numerical function which enumerates all natural numbers with 
property G. 

While f(x,n) is only a partial function, 

h(x,y) = f(x,g(y)) = f g ( y )(x) 

is a genuine function of two variables, since g(y) belongs to the set G. From this we obtain 
the function h(x, x) of one variable by diagonalizing, that is, by putting y = x. We claim 
that the function h(x, x) is not calculable. 

Indeed, suppose it is calculable, then so is h(x,x) + 1. Let P m be a program which 
calculates the latter function. Clearly, m has property G, hence we may write m = g(k), 
for some natural number k. Therefore 

h(x, x) + l = f m {x) = f(x, m) = f(x, g(k)) = h(x, k) 

Now, this is true for all x G N. Taking x = k, we get 

h(k,k) + 1 = h(k,k) 

a contradiction. 

We have thus exhibited a function h(x, x) which is not calculable on an abacus. We 
may squeeze a little more out of our argument. 

It can be shown that f(x,n) is a partial recursive function. The proof of this fact is 
not difficult, but a little tedious, and we shall skip it. Let us assume this result. We claim 
it then follows that g(y) is a not a recursive function. 

Indeed, suppose g(y) is recursive. Then so is h(x,y) = f(x,g(y)), hence also h(x,x). 
But we know that h(x, x) is not calculable; in other words, it is not recursive. Thus g(y) is 
not recursive. Since g(y) was any numerical function which enumerates all natural num- 
bers with property G, it follows that no recursive function can enumerate these numbers. 
We shall return to this statement in Section 2.6. 

The program for calculating f(x, y) may be used as a super-program for calculating 
all partial recursive functions in one variable, since every such partial function has the 
form f n {x) = f(x,n) for some n. 



40 



CHAPTER 2. MORE ABOUT CALCULABILITY 



2.5 Turing programs 

We have seen how to perform calculations on an abacus consisting of an arbitrary number 
of locations, each location being capable of storing arbitrarily many pebbles. One can 
overcome one's misgivings about an infinite number of locations by reassuring oneself that 
it is always possible to dig another hole in the ground when needed. But holes of infinite 
depth are less easy to accept. 

Let us see what happens if we restrict the storage capacity of each location to be 
at most m pebbles. One has in mind the classical abacus with m — 9, to facilitate 
representations of numbers in the scale of 10. But we may as well go all the way and 
take m = 1. Thus each location can hold at most one pebble. Let us call this the binary 
abacus, because each location must be in one of two states: empty or full. 

I 

The elementary instruction X + of the original abacus makes no sense for the binary 

abacus, as one cannot put a pebble into a full location. Instead we shall adopt a new 

I 

elementary instruction X + . If X is empty, we are asked to put a pebble into location 

X and proceed along the left arrow. If X is full, we are asked to leave X alone and 
proceed along the right arrow. We may now enquire whether every calculation that can 
be performed on the original abacus can also be carried out on the binary abacus. Instead 
of arranging the locations in a single sequence 

A x A 2 A 3 ■■■ 

we may put them in double array as follows: 



Ais 


A 23 


A, 


A 12 


A 22 


A 


An 


A 2 i 


A : 



Instead of having n pebbles at location A\ we may have one pebble each at locations An, 
A\2i ■ • A\ n . (It is convenient to represent the number n in this way, and not in the 
scale of 2.) The old instruction A~l may now be translated into a program for putting 
one pebble into the first empty location in the column An, A 12 , .... What does such a 



2.5. TURING PROGRAMS 



41 



program look like? 



start 



4i 



stop 



A + 



12 



stop 

Unfortunately, this will be an infinite flow diagram, unless we change our method of 
describing programs. Indeed, the following finite flow diagram will explain what we have 
in mind: 

start 



stop 



up 



stop 

Here "up" means "one location higher". 

Turing's idea is even more radical. He arranges the locations in one line extending 
infinitely in both directions. Thus the locations are in a natural one-to-one correspondence 
with the integers: 

...,A_ 2 ,A_ 1 ,A 1 ,A 2 ,... 

Moreover he never allows you to move from one location to another, unless the new 
location is adjacent to or coincides with the old location. 

A Turing program is built up from the following elementary instructions: 



start X 



+right +left 

X X / \ 



III 
+ stay "right "left "stay 

xxxx/\xx 



stop 



The first instruction says: start at location X. The second instruction says: add one 
pebble to the location at which you are, if it doesn't have one already, then move to the 
location on the right. If this new location is full, go to the node indicated by the left 
arrow. If the new location is empty, go to the node indicated by the right arrow. 

The other instructions are interpreted similarly. 



42 



CHAPTER 2. MORE ABOUT CALCULABILITY 



One now represents the number x on the binary abacus by x + 1 pebbles at consecutive 
locations. (Thus zero is represented by one pebble.) One says that a function y = f(x) is 
computed by a Turing program, if the calculation set to start with a representation of 
x (the last pebble being at the starting location) ends with a representation of x followed 
by an empty location followed by a representation of f(x) (the last pebble being at the 
terminal location). 

It can be shown that a numerical function can be computed by a Turing program if 
and only if it is calculable, that is, recursive. 

Actually, Turing did not talk about programs but about machines. We shall give a 
brief description of these. 

A Turing machine consists of a central unit which is scanning one square of an 
infinite tape. The central unit is in one of a finite number of states: 

So = start, Si, S 2 , ■ ■ ■ , S m 

Each square of the tape has written on it a word of the finite vocabulary E: 

ai,...,a n 

or = blank. For example, we may have the following situation 



central unit 
reads and writes 
cr|q |cr| t |cr |cr 1 1 1 1 tape 



The scanned square may be read and written upon. The tape is capable of moving in 
either direction. The machine is capable of making the following kinds of moves: 

(1) ( ff>5 .)_( T>1 s.) 
erase a, print r, change state from Si to Sj. 

(2) (a, 50 — (right, S s ) 

move tape one square to the right (or move scanner one square to the left), change state 
from Si to Sj. 

(3) (a, Si) -(left, ,5,) 

move tape to left and change state. 

We assume the machine is deterministic, that is, each move is uniquely determined 
by (a, Si), a e E. 



2.6. EFFECTIVE PROCEDURES 



43 



Without going into technical details we shall say this: if the machine is presented with 
a string of words on the tape when the central unit is in the state start and ultimately 
comes to a stop, then the string is said to have been accepted by the machine. One 
may ask, what sets of strings consist precisely of the acceptable strings of some Turing 
machine? Not surprisingly, the answer is: the recursively enumerable sets discussed in 
Section 2.6 below. 

To compare Turing programs with Turing machines, we take £ = {0, 1} and adopt the 
following dictionary: 

location = square on two-way infinite tape 
pebble = printing of symbol "1" 

program = machine 

node = internal state 

location at one stage of calculation = scanned square 
The proof that a function is computable on a Turing machine if and only if it is 
recursive may be found in the book by Kleene. Alternatively, one may prove directly that 
a function is computable on a Turing machine if and only if it is calculable on the abacus; 
such a proof appears in the recent book by Boolos and Jeffrey. 

2.6 Effective procedures 

Mathematicians believe that a numerical function which can be effectively calculated or 
computed by any means whatsoever is calculable or computable in the technical sense of 
these words, hence recursive. This statement, known as the Church- Turing Thesis cannot 
be proved as we do not know what is meant by "effectively computable by any means 
whatsoever" . 

Given a set P of natural numbers, we may ask this: Is there an effective procedure 
for deciding whether any natural number n is an element of PI As a consequence of the 
Church- Turing Thesis we assert that there is such an effective procedure if and only if 

P is a recursive set, that is, there exists a calculable function / : N > N such that 

VxeN, x G P if and only if f(x) = 0. 

A related concept is embodied in the following. 

2.6.1 Definition A set P of natural numbers is recursively enumerable if it is the 

image of a recursive function, that is, if there is a recursive function f : N > N such 

that P = {/(0), f(l), /(2), . . .}, with repetitions permitted. 



44 



CHAPTER 2. MORE ABOUT CALCULABILITY 



In particular, every finite set of natural numbers is recursively enumerable. For exam- 
ple, the set {a, b, c, d} is the image of the function 

= 
= 1 
= 2 

> 3 

This function is recursive, since 

f(x) = a* S(x) + b* S(\x - 1|) + c * S(\x - 2|) + d * 5(3 ^ x) 

2.6.2 Theorem A set P of natural numbers is recursive if and only if P and its 
complement N — P are each recursively enumerable or finite. 

Proof Assume that P is recursive. If P is infinite, we may define a function g as follows: 

j gifS) = smallest y such that y G P 

\ g(S(x)) = smallest y such that y G P A y > g(x) 

Clearly P = {<?(0), <?(1), . . •}, and so g affords a recursive enumeration of P. 

If N — P is infinite, we may similarly define a function h which will recursively enu- 
merate N — P: 

( h(0) = smallest y such that y G P 

\ h(S(x)) = smallest y such that y G P A y > h(x) 

Conversely, let us assume first that P and N — P are both recursively enumerable, say g 
enumerates P and h enumerates N — P. Since every natural number x is an element of 
P or of N — P, there exists a number y such that x = g(y) or i = Put 

e(x) = smallest y such that x = g(y) Vx = fo(y) 

Then x G P if and only if c/(e(a;)) = x, that is, /(x) = 0, where f(x) = \g(e(x)) — x\. 
Next, let us assume that P is recursively enumerable and N — P is finite, say N — P = 
{&!, b 2 , ■ ■ ■ , b m }. Then we define g as above and put 

e(x) = smallest y such that x = g(x) V x = &i V rr = 62 V . . .V x = b m 

Then we proceed as above. There remains the case when N — P is recursively enumerable 
and P is finite, and this case is treated quite similarly. 

To illustrate the above notions, recall from Section 2.4 that a natural number n has 
property G if the calculation of f n (x) by the program P n terminates. We may as well 
regard G as a set, then what we showed is that no recursive function can enumerate G. 



/(*) = 



a 


if 


X 


b 


if 


X 


c 


if 


X 


d 


if 


X 



2.6. EFFECTIVE PROCEDURES 



45 



In the present terminology, this means that G is not recursively enumerable. It follows 
from the above theorem that G is also not recursive. 

In many applications of these ideas we are interested not in natural numbers, but in 
strings which are made up of symbols of a finite alphabet or words of a finite vocabulary. 
By putting these strings in alphabetic or lexicographic order, we obtain a one-to-one 
correspondence between strings and natural numbers. We may then speak of recursive 
and recursively enumerable sets of strings. 

For example, let us consider the finite vocabulary {(, ), p, q, r, ', —,=>•, V , A, 
which will allow us to deal with the propositional calculus, the propositional variables 
being p, q, r, p', q', r', p" , etc. 

It is easily seen which strings are formulas (e.g. (p =>- q) A r)) and which are not (for 
example pq =>- (')). Thus the set of formulas is recursive. So is the of tautologies, because 
the method of truth-tables allows us to decide effectively which formulas are tautologies. 

To get a language adequate for all mathematics, we adjoin to the above vocabulary 
the set {x, y, z, V, 3, =, e}. Again, the set of formulas is recursive. However, it can 
be shown that the set of theorems is not recursive, that is, there is no effective method 
for deciding whether a formula is a theorem. This is just as well, or mathematicians 
could be replaced by computers. Still, the set of theorems is easily seen to be recursively 
enumerable. It then follows from the above theorem that the set of formulas which are 
not theorems is not recursively enumerable. 

As another example, we may take as our vocabulary the set of all English words, say 
all the words listed in the Oxford dictionary. This set is quite large, but still finite. We 
are then interested in those strings which are grammatical sentences. Linguists are agreed 
that the set of grammatical sentences is recursively enumerable. One might even expect 
that this set is recursive. 

Exercise 

1. State whether each of the following sets is 

a. recursive, 

b. recursively enumerable: 

(i) the set of theorems in mathematics, 

(ii) the set of tautologies in the propositional calculus, 

(iii) the set of statements in mathematics which are not theorems, 

(iv) the set of formulas of the propositional calculus which are not tautologies, 

(v) the set of programs for the abacus, 



46 CHAPTER 2. MORE ABOUT CALCULABILITY 

(vi) the set of programs which calculate a function y — f(x). 



Chapter 3 

Syntactic types and semigroups 



3.1 A simple grammar using types 

It is the aim of grammar to describe the set of sentences in a language. There are several 
kinds of grammars that have been studied from a mathematical point of view. We shall 
first consider the method of syntactic types, also known as "categorial grammar" . We 
shall illustrate this by looking at a small fragment of English. 
We first define the set of types inductively 



T- 


-1. 


s is a type (s = sentence) 




T- 


-2. 


n is a type (n = name) 




T- 


-3. 


if a and b are types so is a ■ b 




T- 


-4. 


if a and b are types so is a/b (read 


"a over b" ) 


T- 


-5. 


if a and b are types so is a\b (read 


"a under b") 



For the moment we shall not accept any types other than those which may be con- 
structed from T-l to T-5. For example, the following expressions are types, where paren- 
theses are used as ordinarily in mathematics: 

s, s/n, (s/n)-n, s/(n-n), n\(s/n) 

We introduce a binary relation > between types. It is our intention to interpret 

"a >■ 6" to mean that every expression of type a is also of type b. The relation *- 

is subject to the following rules, for the time being: 

Ar-1. a >a, 



47 



48 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



Ar-2. if a > b and b > c, then a > c, 

Ar-3. if a > b and c > d, then a ■ c > 6 • <i, 

Ar-4. (a/b)-b >a, 

Ar-5. a-(a\b) 

Later we shall consider some additional rules. 

We also use the arrow between words of the language studied and types. Thus John 

s-n is intended to mean that John has type n. For the sake of illustration we shall 

consider a small fragment of English consisting of the words John, Jane, milk, works, rests, 
likes, calls, poor, fresh, today, and, while, with, for. 

We begin by assigning types to these words. The first three words will be assigned the 
type n, as they are names. (Indeed, milk is the name of a substance.) If works is assigned 
type x, the sentence John works will have type n ■ x. Rule T-5 suggests we take x = n\s; 

then this sentence has type n ■ (n\s) >s. Since John likes Jane is a sentence, likes Jane 

should have the same type as works, that is, n\s. Rule T-4 then suggests we let likes 
> (n\s)/n, so we have the analysis 

John likes Jane 

n (n\s)/n n 

" v ' 

n\s 

s 

Continuing in this manner, we arrive at the following type assignment for our fragment 
of English: 



John 





n 


(name) 


Jane 





n 




milk 





n 




works 




n\s 


(intransitive verb) 


rests 





n\s 




likes 


j» 


(n\s)/n 


(transitive verb) 


calls 





n\s, (n\s)/n 




poor 





n/n 


(adjective) 


fresh 




n/n 




today 




s/s, s\s 


(adverb) 


and 




(s\s)/s 


(conjunction) 


while 





(s\s)/s 




with 





(s\s)/n 


(preposition) 


for 




(s\s)/s, (s\s)/n 





3.1. A SIMPLE GRAMMAR USING TYPES 



49 



A number of examples will show that one can now check the sentence-hood of strings 
that were not used in arriving at the above type assignments. Having learned a grammar, 
one should be able to use it to construct new sentences. 

(poor John) (likes (fresh milk)) 

n/n n (n\s)/n n/n n 
„ „ ' 

n n 

" v ' 

n\s 

S v ' 

s 

((John works) (for Jane)) (while (Jane rests)) 

n n\s (s\s)/n n (s\s)/s n n\s 

" v ' s v ' s v ' 

s s\s s 

" v ' s v ' 

s s\s 

S v ' 

s 

Of course, many meaningless sentences also turn out to be grammatical according to our 
rules, for example: 

poor milk calls fresh John 

milk rests for milk while milk works 
On the other hand, some English sentences have escaped our net, for example, 

Jane calls John fresh 

This could be remedied by adding some further type assignments to the above list. 

Let us try to adjoin the pronouns he and him to our fragment of English. Since we 
cannot say poor him works and Jane likes he, we must not assign the type n to he and 
him. Attempting to justify the sentences he works and Jane (likes him), we are led to the 
following assignments: 

he > s/(n\s) 

him > ((n\s) / n)\(n\s) 

The type of him seems rather complicated. Had we allowed the constructions (Jane likes) 
John and (Jane likes) him, we should have arrived at different types for likes and him, 
namely 

likes > n\(s/n) 

him 5- (s/n)\s 

Exercise 

1. Using type assignments, check that the following are grammatical sentences: 
a. poor poor John works 



50 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



b. John works today today 

c. he works for Jane today 

d. he works, for Jane rests today 

3.2 Analysis of the English verb phrase 

It does not seem likely that we can push the method of syntactic types very far in appli- 
cations to natural languages, as long as we insist that all types be built up from s and n 
alone. In this chapter we shall make use of five primitive types: 

s = complete declarative sentence 

n = name 

i = infinitive of intransitive verb 

p = present participle of intransitive verb 

q = past participle of intransitive verb 

Accordingly we adopt the type assignments: 

work > i 

working >■ p 

worked =- q 

Let us now look at the sentence 

John (must work) 

n i 

It is clear that must work should receive the same type as works, that is, n\s. Since work 
has type i, we are led to assign 
must *-(n\s)/i 

The same type will be assigned to other "modal auxiliaries" such as will and may. 
In the same way, by looking at the sentences 

John (is working) John (has worked) 

n p n q 

we are led to assign 

is > ( n \ s )/p 

has > (n\s)/q 

Similarly we arrive at the following type assignments: 

John must have worked, have ^i/q 

n (n\s)/t q 



3.2. ANALYSIS OF THE ENGLISH VERB PHRASE 



51 



ca 



John has been working, been ^q/p 

n (n\s)/q p 

John is calling Jane, calling ^p/n 

n (n\s)/p n 

John must call Jane, 

n (n\s)/i n 

John has called Jane, 

n (n\s)/q n 

John is called, is *-(n\s)/(q/n) 

n q/n 

We summarize this information in the following table: 



— s-i/n 
called ^q/n 





infinitive 


present 


past par- 


third per- 






participle 


ticiple 


son singular 










present tense 


intransitive 


work 


working 


worked 


works 


verb 


i 


P 


q 


n\s 


transitive 


call 


calling 


called 


calls 


verb 


i/n 


p/n 


q/n 


(n\s)/n 


modal aux- 








must 


iliary 








(n\s)/i 


progressive 


be 




been 


is 


auxiliary 


i/p 




q/p 


(n\s)/p 


perfect 


have 






has 


auxiliary 


i/q 






(n\s)/q 


passive 


be 


being 


been 




auxiliary 


if (q/n) 


p/(q/n) 


q/(q/n) 


(n\s)/(q/n) 



There are of course many other transitive and intransitive verbs, even have may occur 
as a transitive verb. On the other hand, it is difficult to think of many auxiliary verbs. 
Here is presumably almost a complete list: 

modal auxiliaries: must, will, shall, can, may, would, should, could, might. 

progressive auxiliaries: be, stop, start, keep. 

passive auxiliaries: be, get. 
The blank squares in the table indicate that there do not exist forms to fit those 
slots. In the column "modal auxiliaries" we are not surprised to note the absence of the 
nonexistent forms *musting, *musted, and *to must. However, there do exist forms being, 



52 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



having, and had; so why do these not appear in the remaining three squares? Because 
we cannot say *John is being working, *John is having worked, *John has had worked. (An 

asterisk indicates that the expression to which it is attended does not appear in standard 
English.) 

Our table could be enlarged by allowing for plural, past tense and 1st person forms. 
This project would require additional primitive types, and we shall refrain from carrying 
it out. 

Using the above table, one can deduce that many strings of words are grammatical sen- 
tences, even if these strings were not used in constructing the table. Some such sentences 
turn out to be pretty unusual, for example 

Jane has been being called 

n (n\s)/q q/p p/(q/n) q/n 
s * 

P 

s v ' 

Q 

s v ' 

n\s 

V v ' 

s 

It would seem difficult to rule this out, without also ruling out 

Jane has stopped being called 
Jane has been getting called 

Adverbs have been assigned the type s\s, as in 

(John (must work)) today 

n (n\s)/i i s\s 

Sometimes it may seem more natural to let an adverb have type as in 

John must (work well) 

n (n\s)/t i i\i 

However, in John works well it would then be necessary to say that works has not only the 
simple type n\s, but also the compound type ((n\s)/i) ■ i. Similarly one can distinguish 
between "coordinate" conjunctions and "subordinate" conjunctions, as in the following 
sentences: 

John works and Jane rests 

n n\s (s\s)/s n n\s 

John must work while Jane rests 

n (n\s)/i i (i\i)/s n n\s 



3.3. SYNTACTIC CALCULUS 



53 



Exercise 

1. Using type assignments, show that the following are grammatical sentences. 

a. John must have been working 

b. John has stopped calling Jane 

c. Jane may have stopped getting called 

d. John is having milk 

3.3 Syntactic calculus 

We shall investigate a deductive system that will enable us to handle syntactic types in 
the same way as the propositional calculus enables us to handle propositions. 

Primitive types are s, n, i, p, q, and perhaps others, according to need. 

Types are defined inductively thus: primitive types are types, and if a and b are types 
so are a ■ b, a/b, and a\b. 

A formula is an expression of the form a > b, where a and b are types. 

We adopt one axiom scheme: 



(1) 


a — 


— > a 


and five rules of inference: 






a — 


-^b 


i 


(2) 


a — 


5- C 




a ■ b 


c 


(3) 


a — 


-^c/b 




a — 


-+c/h 


(4) 


a ■ b 


■> c 




a • b 


> c 


(5) 


b 


> a\c 




b 


> a\c 



(6) a- b >c 



54 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



A formula is said to be a theorem if it can be deduced from the axioms and rules of 
inference by means of a proof. We shall illustrate what we mean by a proof by looking 
at some examples. 



a/b — x -^ajb 
(a/b) ■ b > c 



a\b — ^ a\b 
a ■ (a\b) > I 



These two proofs show that rules (4) and (5) of Section 3.1 are in fact theorems of the 
syntactic calculus. The theorems established by the following two proofs are new: 



a\b 



J) 



a ■ b 



a ■ b 



a ■ (a\b) 



■b/(a\b) 



a ^{a-b)/b a- 

For example, the last proof establishes the theorem n > s / (n\s) . This asserts that every 

name also has the type s/(n\s), the type of the pronoun he in Section 3.1. Indeed, in any 
context, he may be replaced by John, but not conversely. (It may also be replaced by Jane; 
distinctions of gender do not enter the very limited grammar discussed in Section 3.1.) 
Rule (3) of Section 3.1 does not appear among our present rules of inference. However, 

it can be derived: given a > b and c > d, there is a proof which leads from these 

assumptions to a ■ c > b ■ d. Here it is, a good example of a proof in tree form: 



b-c- 



-b-c 



b-d 



b-d 



given 

a > b 



(b-c)/c 



given 



d — >b\{b-d) 



(b-c)/c 



■b\(b-d) 



a ■ c 



b-c 



b-c 



b-d 



a ■ c > b ■ d 

We shall write this derived rule for future reference: 

a > b c > d 

(a) 



a ■ c >■ b ■ d 

Other derived rules of inference are: 

a > b c - 



■d 



a/d > b/c 



3.3. SYNTACTIC CALCULUS 



55 



a > b c > d 

(7) d\a ^c\6 

In applying the syntactic calculus to English, we assumed the English sentences 
are structured in a certain way. For example, we wrote John (likes Jane) but not 
*(John likes) Jane and thus we arrived at the type (n\s)/n for likes and not n\(s/n). 
As we shall see, these two types are equivalent in the associative syntactic calculus. 

The associative syntactic calculus differs from the syntactic calculus by virtue of 
two additional axiom schemes: 

(7) (a-b)-c >a-{b-c) 

(8) a ■ [b ■ c) (a-b)-c 

We shall abbreviate x > y and y > x, to x <-> y and say x is equivalent to y. 

Then (7) and (8) may be combined to 

a • {b • c) <-> (a ■ b) • c 

The following are now theorems in the associative syntactic calculus: 

a\(b/c) <-> (a\b)/c 
a/(b-c) <-> (a/c)/b 
(a • b)\c <-> b\(a\c) 

For example, we prove a\(b/c) >■ (a\b)/c by putting d = a\(b/c) proceeding as follows: 

d^+a\(b/c) 

6 

a ■ d > b/c 

4 



a • (d ■ c) > (a • d) • c (a • d) • c > b 

2 



a • (d • c) 3- b 

d ■ c > a\6 



5 
3 



d > (a\b)/c 

Another theorem is (a/6) • (6/c) ^a/c, which is proved as follows, making use of 



56 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



the derived rule of inference (a): 

b/c^-^-b/c 



4 



a/b^^a/t (b/cyc ^b a/b ^^ a/b 

a 4 

(a/b) ■ ((b/c) • c) ^(a/b)-b (a/b)/-b >a 

2 



((a/b) ■ (b/c) ■ c — U- (a/b) • ((6/c) • c) (a/b) ■ ((b/c) ■ c) > a 

((a/b) ■ (b/c) -c ^a s 

(a/b) -(b/c) >(a/c) 



Exercises 

1. Show that the following is a derived rule of inference in the syntactic calculus: if 
a >b and c >d then a/d >b/c. 

2. In the associative syntactic calculus prove the following: 

a. a/b >(a/c)/(&/c) 

b. (a\b)/c >a\(6/c) 

c. a/(b-c) >(a/c)/6 

3.4 Multiplicative systems 

A number of algebraic systems have been used, or could be used in applications to lin- 
guistics. We shall begin by considering a multiplicative system (A, *), consisting of a 

set A and a binary operation * : A x A > A. No associative law is postulated, so we 

must distinguish between (a* b) * c and a * (b * c). 

Here is one illustration: (N,*), the set of natural numbers with operation a*b = 
a b . Clearly, the associative law does not hold, as (a b ) c = a bc ± in general. Of 
course, many multiplicative systems do satisfy the associative law, they are then called 
semigroups. Examples of semigroups are (N,+), (N,-), (N,max), (N,min), (N,gcd), 
(N,lcm), (N, first), (N,last), where 

gcd(a, b) = greatest common divisor of a and b 

lcm(a, b) = least common multiple of a and b 

first(a, b) = a 

last(a, b) = b 



3.4. MULTIPLICATIVE SYSTEMS 



57 



The operations first and last are the projections we have met earlier. They are the only 
ones among the listed operations which are not commutative, for example, 

first (a, b) = a^b = first (6, a) 

in general. In all these examples, the associative law is established on the same principle, 
for example, 

max(max(a, b), c) = max(a, b, c) = max(a, max(6, c)) 

first(first(a, b), c) = first(a, 6, c) = first(a, first(6, c)) 

To give a motivation for the syntactic calculus studied in the preceding chapter, we shall 
define three operations on the subsets of a multiplicative system (a, •). Let X, Y, and Z 
be subsets of A, then we define 

X-Y = {x-y\xeXAyeY} 
Z/Y = {x e A I 3yeY, x ■ y E Z} 
X\Z = {y E A | 3xeX, x-y e Z} 

It is now easily seen that 

X-Y C Z if and only if X C Z/Y 
if and only if Y C X\Z 

and this should give some motivation for rules (3) to (6) in Section 3.3. Here is a proof 
of the first statement: 

X-YCZ if and only if VxeXVyeY , x ■ y E Z 
if and only if VxGX, x E Z/Y 
if and only if X C Z/Y 

The second statement is proved similarly. 

Given a set S, we shall define a bracketed string in E as follows: 

BS-1. all elements of S are bracketed strings in E, 

BS-2. if A and B are bracketed strings in E, so is (AB), 

BS-3. nothing is a bracketed string unless this follows from (1) and (2). 

For example, if a, b and c are elements of E, then ((ab)c) and (a(fec)) are distinct 
bracketed strings of length 3. In practice, the outside parentheses are usually omitted. We 
may define an operation • between bracketed strings by putting A ■ B = (AB). Then the 
bracketed strings form a multiplicative system, called the free multiplicative system 
generated by E. Perhaps the syntactic calculus should really be considered as dealing 



58 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



with subsets of the free multiplicative system generated by the vocabulary of the language 
we wish to study, these subsets being interpreted as types. 

We shall conclude this chapter by looking at a purely mathematical problem. Given 
elements ai, a 2 , . . ., a n of E, how many distinct bracketed strings do there exist which 
contain exactly these elements in the given order? Let f(n) be this number. When n — 1, 
2, 3, and 4, we can easily calculate f(n) directly. 



n 


bracketed strings 


f(n) 


1 


a 




1 


2 


(ab) 




1 


3 


((ab)c), (a(bc)) 




2 


4 


(((ab)c)d), ((ab)(cd)), 


(a(b(cd))), (a((bc)d)), ((a(bc))d) 


5 



Can we obtain an explicit formula for f(n)7 First we shall obtain a recursion formula. 
For n > 1, every bracketed string of length n has, by definition, the form (AB), where A 
and B are bracketed strings of lengths k and n — k, respectively, where 1 < k < n — 1. 
Therefore 

f(n) = E n k Zlf(k)f(n-k) 

( * ) 

= f(l)f(n - 1) + f(2)f(n - 2) + • • • + f(n - 1)/(1) 

This formula does allow us to calculate f(n) inductively but we shall do better than 
that. Consider the formal infinite power series in the indeterminate x: 

oo 

F(x) = ]T f(n)x n = f(l)x + f(2)x 2 + ■■■ 

n=l 



3.4. MULTIPLICATIVE SYSTEMS 



59 



Then F(0) = /(O) = and we have 



F(x) 2 = £/(*)** E^O* 1 



J vr-r 1 

k=l 1=1 



= (f(l)x + f{2)x 2 + f(3)x" + ■■■) (f(l)x + f(2)x 2 + f(3)x* + • • •) 
= (/(l)/(l))x 2 + (/(l)/(2) + /(2)/(l))a ; 3 + 

(/(l)/(3) + /(2)/(2) + /(3)/(l))x 4 + --- 
= f(2)x 2 + /(3)x 3 + /(4)x 4 + • • • 



oc 



E /H^ n 



n=2 

oo 



E /H^ n - /(^ 



n=l 



= F(x) - X 



Thus 



F(x) 2 - +x = 



which may be solved using the quadratic formula (be careful, F(x) is the unknown, while 
x is the constant term!) to yield 

F(x) = - 

Since F(0) = we must choose the minus sign. Thus 

Ft?) = 1 2 ' 

The expression (1 — 4a;) 1 / 2 can be expanded using the binomial theorem with exponent 
1/2. When this is done, we find that 

F{x) = l/2-l/2j:( l, *){-AxY 

n=0 \ 

and therefore the coefficient f(n) of x n is given by 



n 



(**) /H = l/2 ' (-1)»4» 



(-:) <- 



60 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



We do not wish to leave the answer in this form, but we shall use the fact that the 
binomial coefficient 

fl/2\ _ l/2(-l/2)(-3/2)---(l/2-n-l) 
\ n J l*2*3*---*n 

_ (-l)(-2)(-3)--.(-(2n-3)) 
2 n n\ 

Multiplying top and bottom by2*4*6*---* [2n — 2), we obtain 

/l/2\ _ (-l)"- 1 (2n-2)! 
\ n J ~ 2 n n\2 n ~ l {n- 1)! 

Substituting this into formula (**), we get 

fin) = l (-ir 1 (2n-2)! 

2n-2)\ 
~n\(n- 1)! 

_ 1 /2n-2- 
n \ n — 1 

Writing n — 1 = m, we end up with the neater formula 

1 /2m\ 

(* * *) f(m + l) = for all m > 

m + lU/ 

For example, taking m = 4, we calculate 

1 = 1 8*7*6*5 
JK J 5 V4/ 5 1*2*3*4 

The formula for /(n) was discovered by Catalan in 1838. 



Exercises 

1. Show that (N,gcd) is a semigroup. 

2. Calculate /(6) twice, once using the recursion formula (*) and once using the explicit 
formula (***). 

3. For each a G N, let (a) = {a * x \ x G N} be the set of all multiples of a. In the 
multiplicative system (N, *), prove that (a) * (6) = (a * b). Hint: (a) * (6) = {a*b* x *y \ 
x,ye N}. 



3.5. SEMIGROUPS AND MONOIDS 



61 



4. In the notation of this section and Exercise 3, show that (c)/(a) = (c/d), where 
d = gcd(a, c). Hint: Check that 

(c)/(a) = {y E N | Vrre(a), y * x E (c)} 

= {y E N | VmgN, y*a*uE (c)} 
= {?/ G N | c divides y * a} 



Moreover, writing c = cid and a = aid, use number theory to show that c divides y * a if 
and only if Ci divides y. 

3.5 Semigroups and monoids 

In the previous section, we defined a semigroup as a multiplicative system (A, •) satisfying 
the associative law: 

(a • 6) • c = a • (b • c), for all a,b,c E A 

A monoid (A, •, 1) consists of a semigroup (A, •) together with an element 1 E A such 
that 

a • 1 = a = 1 • a, for all a E A 

We call 1 the unity element. 

The unity element is unique, since if both 1 and 1' satisfied those equations, we would 
have 

1 = 1 • V = V 

Examples of monoids are (N, +, 0), (N, *, 1), (N, max, 0), (N, gcd, 0), and (N, 1cm = 1). 
A little reflection will show that (N, min), (N, first), and (N, last) are not monoids. 

A group (A, consists of a monoid (A,-,l) together with an operation _1 : 

A 5- A such that 

a ■ gT 1 = 1 = gT 1 • a for all a E A 
A group is called Abelian if also 

a ■ b = b ■ a for all a, b E A 

Examples of Abelian groups are the integers under addition and the positive ratio- 
nal under multiplication. An example of a non- Abelian group is the set of all matrices 

^ ° ^ ^ , where a, 6, c and d are integers such that ad — bc= 1, under the usual multi- 
plication of matrices. 



62 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



It has been suggested to change the notation n\s to n^ 1 ■ s, and to treat syntactic 
types as elements of a group. Indeed, the type of poor John works would then be computed 
as (n ■ n~ l ) • n • {n^ 1 • s) — s. Unfortunately the type of *John poor works would also have 
type n • (n • n~ l ) • • s) — s, and so would many other unacceptable strings of English 
words. 

One place where we can apply the theory of groups is to the language of mathematical 
physics. The grammatical sentences of this language are formulas or equations both sides 
of which have the same dimension. The types or dimensions form an Abelian group. One 
takes as basic dimensions L(— length), T(= time), and M(= mass). In a formula such as 
Fl = mv 2 one assigns dimensions 

m > M 

v >LT v 

I >L 

F ^MLT- 2 

Using the laws of an Abelian group, we find that both sides of the given equation 
have dimension ML 2 T~ 2 , hence it is a grammatical sentence. This is not to say that the 
formula is true; the correct formula may very well read Fl = l/2mv 2 . On the other hand, 
the equation Fl 2 = mv is not even a grammatical sentence. 

In this manner it is customary to assign to every physical quantity a dimension 
M a L l3 T^, where a, (3 and 7 are rational numbers. Since this dimension is completely 
determined by the triple (o;,/3, 7) we see that the dimensions form an Abelian group, in 
fact a three-dimensional vector space over the field of rational numbers. 

Most useful in applications to linguistics are the so-called free semigroup and monoid 
generated by a set £. In view of these applications, E will be called the vocabulary 
here, although it is often called the "alphabet" . 

A string in £ of length n > 1 is an expression aia 2 • ■ • a n , where a±, a 2 , . . . , a n G S. 
Strings are multiplied by juxtaposition 

(aia 2 • • • a n ) ■ (bib 2 •••b k ) = a x a 2 ■ ■ ■ a n bib 2 •••b k 

With this multiplication, the set of all strings in E is a semigroup, called the free semi- 
group generated by E. 

For reasons of elegance, people often consider the free monoid E* generated by E 
instead. To construct this, we only have to admit the empty string 1 of length 0, and put 

A- 1 = A = 1 • A 

for each string A e £*. 



3.5. SEMIGROUPS AND MONOIDS 



63 



We shall consider three examples, when £ has no element, one element, and two 
elements respectively. 

When £ is the empty set, then so is the free semigroup generated by S, but the free 
monoid S* = 1 must contain the empty string as an element, hence is not empty. 

When E = {a}, the free semigroup consists of all strings a n — a • • • a (n times), where 
n > 1. The free monoid £* = {1, a, a 2 , . . .} = {a n \ n G N}, provided we define a = 1. 
Multiplication of a 11 and o k yields o n ■ o k = a n+k ; in fact the free monoid (S*, •, 1) is 
isomorphic to (N,+,0). (An isomorphism between two monoids is a one-to-one corre- 
spondence so that the binary operation and the unity element of one monoid correspond 
to those of the other.) Note that a n ■ a k = a k ■ a n , hence S* is commutative. 

When E = {a, r}, where cr^r, the free semigroup is 

{a, r, aa, err, ra, rr, aaa, . . .} 

and the free monoid is the union of this and {1}. As err and to are different strings, we 
see that X* does not satisfy the commutative law. 

We may also consider free semigroups and monoids generated by more than two ele- 
ments. Indeed, the vocabulary of English has a very large number of elements. However, 
in principle, for any vocabulary of more than two elements, this number can be reduced 
to two by coding as we shall see. 

For instance, in English we may start by reducing the vocabulary to an alphabet of 
27 elements (including the space between words), and these 27 elements can be further 
reduced to 3 symbols using the Morse code: •, — , and |, the last being the divider between 
successive letters. Finally we may represent • = rar, — = raar, and | = rr, where a is a 
click and r is a pause, as in telegraphy. 

Exercises 

1. If a semigroup can be made into a monoid by selecting a unity element, show that the 
unity element is uniquely determined. 

2. If a monoid can be made into a group by specifying the operation -1 , show that x~ x 
is uniquely determined by x. 

In the following exercises, two monoids are called isomorphic if they become identical 
when the elements and operations are suitably renamed. 

3. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by the 
empty set. 

4. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by {<r}, 
but that the latter is isomorphic to (N, 0, +). 

5. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by a 
set S with two or more elements. 



CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS 



Chapter 4 

Context-free grammars 



4.1 Grammars for some fragments of English 

Given a vocabulary E, let E* be the free monoid generated by E, that is, the set of all 
strings of elements of E, including the empty string 1. By a language in E we mean a 
subset £CE*. We want to describe £ by a finite set of grammatical rules. Let us begin 
by looking at an example taken from English. 

4.1.1 Example 



This language consists of a total of 2*3*2*2*3 = 72 sentences. We could adjoin a 
symbol S(— sentence) to the vocabulary and list 72 grammatical rules of the form: 



E = {the, a, man, ball, cat, hit, took} 



the man hit the ball 
C = < a ball took a man 
cat cat 




S 



^-the man hit the ball 



A more customary method is to adjoin an auxiliary vocabulary 



{S, NP, VP, Art, N, Vt] 



of grammatical terms, where 



S = sentence 
NP = noun phrase 
VP = verb phrase 



Art = 
N = 
V t = 



transitive verb 



noun 



article 



65 



66 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



Instead of the above 72 grammatical rules, only 10 are required, to 

S ->- NP VP Art ->■ the, a 

NP Art N N man, ball, cat 

VP ->- y ( JVP VJ hit, took 

Note that, compared with the categorial grammars studied earlier, the present arrow 
points in the reverse direction. This is because we now take the point of view of the 
speaker who generates the sentence, whereas previously we took the point of view of the 
listener who analyses the sentence. We subject the arrow to the usual reflexive, transitive, 
and substitution rules: 

, m A *B B >C A >B C 

A^Ua (T) (S) 

A >C AC > BD 

To show that the man hit the ball is a sentence, we could proceed quite formally thus: 

Art ^the N ball 

(S) 

NP *~ArtN Art N ^ the ball 

(T) 



V t ^ hit NP ^the ball 



VP > V t NP V t NP > hit the ball 

VP ^ hit the ball 

Art ^the N ^man 



(S) 
(T) 



(S) 

NP ^ArtN Art N ^ the man 

(T) 

NP > the man VP > hit the ball 

(S) 

NP VP NP VP ^the man hit the ball 



(T) 

S s-the man hit the ball 

In practice, this proof in tree form may be abbreviated as a so-called parsing tree, shown 
as follows: 

S 

NP VP 

Art N V t NP 

the man hit 

Art N 
the ball 



4.1. GRAMMARS FOR SOME FRAGMENTS OF ENGLISH 



67 




the ball 

The reader may argue that there is too much theory just to replace 72 rules by 10. 
The next example will show that a language may quite easily consist of an infinite number 
of sentences (most languages do), and that grammar then becomes a matter of necessity 
rather than choice. 

4.1.2 Example E is as above with three new words added: and, or, but. 

New grammatical term: Conj = conjunction. 
New grammatical rules: S ->■ S Conj S 

Conj ->■ and, or, but. 

The total number of grammatical rules is now 14, but the total number of sentences is 
infinite, for example 

the man hit the ball and the man hit the ball and • • • 

4.1.3 Example One may argue that the conjunctions and and or can be placed not 
only between sentences but also between noun phrases, verb phrases and transitive verbs. 
This is not so for the conjunction but. To allow for this we rewrite the grammar of 
Example 4.1.2 in the following modified form 



S ->■ S Conj S Conj ->■ and, or 

NP NP Conj NP S S but S 

VP ->- VP Conj VP V t ->- V t Conj V t 



The kind of grammar described in this section is known as a "context-free phrase- 
structure grammar" or just a "context-free grammar". A precise definition of this concept 
will be given in the next section. 



68 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



Exercise 

1. Give parsing trees for each of the following sentences, using the grammar of Exam- 
ple 4.1.3: 

the man and the cat hit the ball 

the man took and hit the ball 
the man hit the ball and the cat 
the man hit the cat and the cat hit the ball 

4.2 Production grammars and deductive systems 

Before we give a precise definition of what we mean by the term "context-free grammar" 
introduced in Section 4.1, we shall consider a more general concept. 

4.2.1 Definition A production grammar or rewriting system Q = (V, 

consists of the following data: 

PG-1. a finite set V called the vocabulary (its elements are called words ); 

PG-2. two nonempty subsets E and E' of V called the terminal and initial vocabulary 
respectively; 

PG-3. a finite subset V ofV* x V* , that is, a finite set of pairs of strings of words. These 

pairs are called productions, and we write A > B instead of saying that (A, B) 

is an element ofV. 

These data are usually subjected to the following restrictions: 
PG-4. E and E' have no elements in common. 

PG-5. E' has only one element, that is, E' = S, where S is not in E. 
PG-6. A is not a nonempty string of terminal words; 
PG-7. B is not a nonempty string of initial words. 
COMMENTS. 

(a) Sometimes V is called an "alphabet" , its elements are then called "letters" . 



4.2. PRODUCTION GRAMMARS AND DEDUCTIVE SYSTEMS 



69 



(b) We are particularly reluctant to adopt the restriction (PG-5). Let us see what we 
could do without this restriction. E' might consist of sentence types: S = statement, 
question, command, etc. 

If E consists of all English words, V — E would consist of grammatical terms such 

as NP, VP etc., and the productions would be S >■ NP VP etc. Thus Q would be 

a grammar of English from the point of view of the speaker. However, we get another 
example if we take E' to be the vocabulary of English, E the set of sentence types, and if 
we reverse the arrows in all of the above productions, so that a new production would be 

NP VP >■ S, for example. Then the new grammar Q would be a grammar of English 

from the point of view of the hearer. 

(c) The reason for the restriction PG-4, for example, is that, in a grammar of English from 
the point of view of the speaker, a terminal string such as the man hit the cat should not 
be rewritten. Following Chomsky, a grammar is said to be 

(0) of type 0, or a generative grammar, if it satisfies the above restrictions PG-4, 
PG-5, PG-6 and PG-7. Usually people are interested in grammars that satisfy 
further restrictions. 

(1) of type 1, or a context sensitive grammar, if, in every production A >B, 

A is not longer (contains not more words) than B; 

(2) of type 2, or a context-free grammar, if, in every production A *-B, A is a 

nonterminal word and B is nonempty; 

(3) of type 3, or a finite state grammar, if every production has the form A ><r 

or A > oB where A is a nonterminal word, a is a terminal word and B is any 

word. 

Clearly, generative C context sensitive C context-free C finite state. We shall not ex- 
plain the words "context-sensitive" and "context-free". In fact, we shall disregard context 
sensitive grammars altogether and, for the time being, confine attention to context-free 
grammars. 

With any grammar Q we associate a deductive system T>(Q). Its formulas are 

expressions of the form C *-D, where C and D E V*. An axiom of V(Q) is a production 

A >B or an instance A > A of the reflexive law (R). Moreover, there are two rules 

of inference the transitive law: 



A 



>B 



B 



(T) 



A 



>C 



and the substitution rule: 



A 



>B 



C 



>D 



(S) 



AC 



^BD 



70 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



Proofs are defined inductively: every axiom is a proof, and applying one of the rules 
of inference to two proofs we get a new proof. The last formula in a proof is the theorem 
which has been proved. 

Proofs are usually written in tree form, see for example the proof of the formula 

S > the man hit the ball 



in Section 4.1. This formula is therefore a theorem in the deductive system T>(Q), where 
Q is the grammar of Section 4.1, Example 4.1.1. The substitution rule (S) is sometimes 
given a different formulation: 

A 

(so 

CAD *~CBD 

It is easy to derive (S') from (S) as follows: 



(R) 

C V > C A s- B 



(S) (R) 



CA > CB D K ' > D 

(S) 

CAD ^ CBD 

Conversely, one may derive (S) from (S'). Note that C = 1AC, BC = BC1. 

A *B C >D 

(S') (so 

AC *BC BC *~BD 

- (T) 

AC *BD 

Grammatical facts are expressed by formulas which can be proved in the deductive 

system V{Q). We are particularly interested in theorems of the form S >E, where S is 

an initial word and E e S* is a string in the terminal vocabulary. The language or set of 
sentences generated by the grammar Q is the set C{Q) consisting of all E e S* for which 
S *- E is provable in T>{Q). 

In the examples of Section 4.1, E is a fragment of the vocabulary of English, V — £ 
consists of grammatical terms, £' = {S}, and the productions are the grammatical rules 
stated there. We shall now give two other illustrations of certain artificial languages, built 
up from a vocabulary consisting of two words, say a and r. 



4.2.2 Example 



C\ = {ar,aarT,aaaTTT, . . .} 
= {a n r n | n > 1} 



4.2. PRODUCTION GRAMMARS AND DEDUCTIVE SYSTEMS 



71 



We wish to find a context-free grammar Q\ so that C\ = C(Qi). Take V = {o, r, S} 
and let V consist of the following two productions: 

P-l. S ><tt, 

P-2. S xtSt. 

Here, for example, is a proof of the formula S ^oott, showing that oott G C\ 

Q P-l 

S ^OT 

P-2 (S') 

S >■ a St a St > oott 

(T) 

S > OOTT 

It can easily be seen that, for any string E G £*, E E C\ if and only if 5 >E is 

deducible from P-l and P-2, that is, provable in the deductive system T>(Qi). While this 
fact ought to be clear intuitively, a proof will be given in Section 4.7, just to show that it 
is possible to give such a proof. 

4.2.3 Example 

E = {o,t}, o^t 

£ 2 = {OO, TT, OTTO, TOOT, .. .} 

= {AA# I A G S* and A ^ 1} 

where A* is the mirror image of A. That is to say, if A = a\a 2 . . . a n _ia n , where the 
di G S, then A* = a n a n _i . . . a 2 ai. We wish to find a context-free grammar £ 2 such that 

c 2 = c(g 2 ). 

Again take V = {o, r, S} and let T 3 consist of the following productions: 
P-3. S ^oo, 

P-4. S ^TT, 

P-5. 5 ><tS<t, 

P-6. 5 ^tSt. 

It can now be easily seen that a string E G S* is a sentence of £ 2 if and only if S *-E 

is deducible from P-3 to P-6. 

REMARK. The language £[ = \,ot,oott, . . . = {o n T n \ n > 0} can be described 

more elegantly by a grammar with productions S > 1 and S > oSt. However, £[ 

is not the same language as C\, it has one additional sentence, the empty string, and 
the grammar is not context-free. Similarly the language C 2 = {AA# \ A G S*} can be 

described more elegantly than £ 2 by a context-free grammar with productions S > 1, 

S > oSo and S > tSt. This grammar is also not context-free. 



72 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



Exercises 

1. DeduceS* >■ aa ottt for L\. 

2. DeduceS* ^rarrar for £ 2 - 

3. Find a context-free grammar that will generate the language 

{a n pr n | n > 1} 

for the terminal vocabulary E = {p, o", r} 

4. What language is generated by a context-free grammar 

(V,{<7, r}, {S}, P) 

with vocabulary 

V = {a, r, S, U, V} 

and productions 

V = {U >■ cr, f/ xtC/, V >t, V ^rV, S *UV} 



4.3 Grammars of some formal languages 

Context-free grammars are useful not only for describing natural languages such as Eng- 
lish, but also for describing certain formal languages created by mathematicians. In 
particular, we shall discuss three such languages: the propositional calculus, the language 
of set theory, and a programming language for the abacus. 

4.3.1 Example The propositional calculus. Its vocabulary is usually presented as 
follows: 

Variables: p,q,r, . . . 

Connectives: ->,=>-, A, V, 

Parentheses: (,). 

The set of sentences (= formulas) is usually described inductively as follows. 
F-l. Every variable is a sentence. 
F-2. If A is a sentence so is ->A. 

F-3. If A and B are sentences so are (A =>• B), (A V B), and (A A B). 
F-4. Nothing else is a sentence. 



4.3. GRAMMARS OF SOME FORMAL LANGUAGES 



73 



This is to say, the set of sentences is the smallest set of strings in the given vocabulary 
which contains F-l and is closed under F-2 and F-3. For example, we have the following 
two sentences with their construction: 

(1) (1) (1) (1) 

p q q r 



(3) (1) (1) — — (3) 

P^q r p (qVr) 

(3) (3) 

(H?)Vr) (P^(gVr)) 

Usually people omit outside parentheses. Thus the above two sentences may be written 
without ambiguity: 

(y> =>• g) V r, p => (g V r) 

However it won't do to omit parentheses altogether, as the expression p =>- q V r is am- 
biguous. 

Actually, the right parentheses are redundant, and one could still distinguish the 
following: 

((p =>- q V r, (p =>- (q V r 

Parentheses may be redundant, but they do facilitate reading. In English, ambiguity 
is avoided in a similar fashion. We say 

if p then q in place of (p q 
either p or q in place of (p V q 
both p and q in place of {p A q 

The above two formulas are transcribed into English thus: 
either if p then q or r, if p then either q or r 

The second formation is more common than the first. The reason for this lies in the 
limited capacity of the short term memory in the human brain. 

A word should be said about the bracketless or "Polish" notation. One may avoid 
parentheses altogether by writing connectives in front, thus =>- pq, Vpg, Apq in place of 
(p => q), (pVg), (p A q). The above two formulas would become in Polish notation: 
V =>- pqr, p\/ qr. 

Economy of writing and ease of reading do not always seem to agree! However, it is 
not clear if the ease of reading is due to our lack of familiarity or this notation is inherently 
harder to understand. After all, we use a form of this in vertical mode when we learn 
arithmetic in elementary school. 

After this digression, let us return to the propositional calculus and obtain a context- 
free grammar for it. 



74 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



First, we must make the vocabulary finite, so let us replace p, q, r, ... by p, p', p", 
Thus we have 

£ = {p,', -.,=*►, A, V,(,)} 
V-Z = {S,V, C} 

where 

S = sentence 

V = propositional variable 
C = binary connective 

The productions are the following: 

V^p S^V 

C^^,y,A S^(SCS) 
Here, for example, is a parsing tree for the sentence (p =>- (p' V p")): 




p 



We have not indicated the symbols "(" and ")" explicitly on the tree. 

4.3.2 Example The language of set theory. As all mathematics can be reduced to 
set theory, it is now fashionable to regard this also as the language of mathematics. We 
present a context-free grammar for set theory. 



E = {V,=,e,-,^,A,V,V,3,(,)} 



4.3. GRAMMARS OF SOME FORMAL LANGUAGES 75 

V-X = {S,V,Q,C} 

where 

S = sentence 

V = individual variable 

Q = quantifier 

C = binary connective 

Productions are the following: 



V 




= v 


V 




•S^V G V 


Q 






Q 






c 


— =S>, V, A 


S^Q S) 



We give an example of a parsing tree for the sentence: 

VrrEb' (x = x) V -i(x G x') 



S 




x 

We have not indicated the symbols "(" and ")" explicitly. 

Of course, the above grammar does not tell us everything we want to know about the 
language of mathematics. It does not distinguish between open sentences, which contain 
free variables, and closed sentences, which do not. Above all, it does not generate the one 
set we are really interested in, the set of theorems. 



76 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



4.3.3 Example A programming language for the abacus. For purpose of illustration, 
recall the program: transfer content of A to B. 

start 



^"(nocle 1) 




5+(node 2) stop 

We may regard this program as consisting of the following instructions: 



start 
1 
2 



go to 1 

A~ go to 2 else go to stop 
B + go to 1. 



We wish to consider these instructions as sentences of a language. 

To ensure a finite vocabulary, we place A, B, C, . . . by A, A', A", 
byl, 1', 1", ... 

£ = {start, stop, +, ', goto, else, :} 

V-S = {S, L, N, M} 



and 1, 2, 3, . . . 



where 



S = sentence (= instruction) 
L = location 

N = node other than startor stop 
M = node other than start. 



Productions: 

L 
L 

N - 



■A 
V 
1 



S N : L~ go to M else go to M 



M^-stop 

S start: go to M 

S -*-iV : L+ go to M 



Exercises 

1. Write out a parsing tree for ((p ^> p') V p"). 

2. Along the lines of Example 4.3.3 above, what instructions are needed for the program: 
copy content of A into B. 



4.4. NEW LANGUAGES FROM OLD 



77 



4.4 New languages from old 

Given a terminal vocabulary E and a language £ C E*, we may form two new languages 
in the same vocabulary: 

-£ = {A e E* | A i £} 
C# = {A* G E* | A e £} 

—£ is the complement of £ and is the mirror image of £. Given two languages £ 
and £' in the same terminal vocabulary E, we may form three new languages as follows: 

CUC' = {A G E* | A G £ V A G £'} 
£ n £' = {A G E* | A G £ A A G £'} 
££' = {A4' G E* | A G £ A A' G £'} 

£ U £' is the union, £ n £' is the intersection, and ££' is the product of the given 
languages. As an example of the product of two or more languages we may form 

£ 2 = {AB GE* I A,B G £} 

£ 3 = {ABC G E* | A,B,C G £} 

and so on. Finally, we are able to construct 

£* = {A G E* | 3n>l, A G £ n } 

Thus £* consists of all strings which are made up by juxtaposing any number of strings 
of £; that is, the sentences of £* are the texts of £. For example, if £ = {a}, then 
£ 2 = {a 2 }, £ 3 = {a 3 }, and £* = {a, a 2 , a 3 , . . .}. 

4.4.1 Theorem If £ and £' have context-free grammars then so do £ # , £* , £ U £' , 
and ££' . 

Proof We shall assume that £ has vocabulary V, initial element S G V — E, and 
productions V We also assume that £' has vocabulary V', initial element S' G V — E, 
and productions V . Without loss in generality, we may also assume that V — E has no 
symbols in common with V' — E; otherwise we could change the notation to make this so. 
Thus V n V = E. 

(1) For the grammar of £ U £' we take the vocabulary V U V U {T}, assuming that T 
is a new symbol, not in V or V. T will be the initial element for £ U £'. As productions 

we take V U V U {T ^ S, T > S'}. It is easily seen that, for any string E G E*, 

T is deducible from these productions if and only if E G £ U £'. 

(2) For the grammar of ££' we take the same vocabulary as in (1), and the set of 

productions V UV' U {T *-SS'}. It is easily seen that, for any string E G E*, T >E 

is deducible from these productions if and only if E G ££'. 



78 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



(3) C* has sentences of the form A\A 2 ■ ■ ■ Ak, where Ai, A 2 , . . ., A k e C, and k > 1. 
We take as vocabulary V U {T}, where T is a new symbol that will serve as the initial 

element. We take as productions the set V U {T > 5, T > ST} It follows that 

T ^S 2 , T ^S 3 , etc. It is easily seen that for any string E G £*, T ^E is 

deducible from the above productions if and only if E G C*. 

(4) For the grammar of C* we take the same vocabulary as for £ but replace the 

productions of C by defined so that the production A* > B# for every production 

A B of V. It is easily seen that, for any string E G £*, 5 > .E is deducible 

from these m productions if and only if S — S # > was deducible from the old 

productions, that is, E* G £, that is, E G C*. 

Up to verifying the statements about which it was said that they could easily be seen, 
our proof is now complete. 

To illustrate this theorem and its proof, we recall the languages C\ and C 2 of Section 4.2 
as well as their grammars. To make sure that the auxiliary vocabularies have no symbols 
in common, we replace their initial elements by Si and S 2 respectively. 

The productions for L\ were: 

Si >■ or, Si > aS\T. 

The productions for C 2 were: The productions for L\ U C 2 were: 

S 2 > cro", S 2 > rr, S 2 > <jS 2 <j, S 2 > rS2r. 

The productions for £i U £2 now are: 

51 >■ err, Si >■ (7 Sir, S2 >■ era, S 2 > rr, 

5 2 ^(7S2<7, S 2 ^rS 2 r, T ^Si, T ^S 2 . 

T serves as the initial symbol. 
The productions for C\C 2 are: 

Si > err, Si > o" Sir, S2 *- (7(7, S2 *- rr 

S2 5-crS2<7, S2 ^tS 2 t, T > S1S2 

The productions for C\ are: 

Si >crr, Si -aSir, T * S u T ^S X T 

The productions for Cf are: 



Si >rcr, 



Si -rSid. 



4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED? 



79 



The initial symbol here is still Si. 

The reader may ask why we did not include in the theorem assertions that — C and 
C fl C have context-free grammars. It will in fact be shown in Section 4.8 that there 
exist languages £ and C with context-free grammars for which £ fl £ does not have a 
context-free grammar. 

Assuming that, we shall now prove that the complement of a language with a context- 
free grammar does not necessarily have a context-free grammar. Indeed, suppose this 
were the case. Let C and C! be any two languages (with the same terminal vocabulary) 
which have context-free grammars. Then 

CnC = -{-Cu-C) 

would also have a context-free grammar, which is a contradiction. 
Exercise 

1. Find context-free grammar for the following languages: 

C* 2 , Cf, ClUjC 2 ,jCfC 2 , (A£ 2 )*, (£iU£ 2 ) # 
Here C\ and £ 2 are the languages introduced in Section 4.2. 

4.5 How are sentences produced and recognized? 

It is highly unlikely that a person while engaged in uttering a sentence will construct a 
parsing tree in his mind or even carry out a proof in a deductive system. Let us look at 
an example, based on the grammar of Section 4.1. 



n 

S ->- NP VP 2 

->■ Art N VP 3 

-»■ the iV VP 2 

>- the man VP 1 

-> the man VP Conj VP 3 

-> the man V t NP Conj VP 4 

the man took NP Conj VP 3 

-> the man took Art N Conj VP 4 

-> the man took the iV Conj VP 3 

the man took the ball Conj VP 2 

the man took the ball and VP 1 

the man took the ball and V t NP 2 



80 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



the man took the ball and hit NP 1 

the man took the ball and hit Art N 2 
-> the man took the ball and hit the N 1 
-> the man took the ball and hit the cat 

Note that in going from line 1 to line 2, we apply the production NP > Art N 

rather than the production VP > VP Conj VP. Again, in going from line 2 to line 3, 

we apply the production Art >■ the rather than N > man or VP > VP Conj VP. 

In other words, we replace nonterminal words as soon as possible, starting from left to 
right. 

We note that at each stage of the generation of the above sentence, part of the sentence 
has already been uttered, while the rest only exists in the form of a plan. The plan consists 
of a string of grammatical terms. The number n of these terms varies, as we go along, from 
2 to 0, the maximum in the above example being 4. We may think of these grammatical 
terms as being stored temporarily in the short term memory of the human brain. 

Psychologists have found that the short term memory is capable of storing at most 
seven "chunks" of information. They define one chunk as a nonsense word. However, 
adapting a proposal by Victor H. Yngve to our present terminology, we are tempted to 
say that each element of the vocabulary counts as one chunk. Yngve himself believed that 
the grammatical rules of a natural language would automatically rule out n > 7. This is 
not quite true, as the following example shows. 

Let us look at the fragment of English consisting of the words: 

John, Jane, Tom, works, rests, complains, if, then 

together with the following productions: 

S^NP VP 
NP John, Jane, Tom 
VP works, rests, complains 

S — if S then S. 

We shall now form two sentences, which in logical notation have the forms p =>- (q =>- 
(r =>- s)), ((p =>■ q) =>- r) =>■ s. 

(1) If John works then if Jane rests then if Tom works then Jane complains 

(2) If if if John works then Jane rests then Tom works then Jane complains 

One may conceivably utter (1), but uttering (2) is inconceivable, although grammat- 
ically they are both well- formed. Why is this so? Let us see how (1) would actually be 
uttered. 



4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED? 



81 



n 

S ->■ if S then S 3 
if JVP VP then 5 4 
if John VP then 5 3 
-> if John works then S 1 
-> if John works then if S then S 3 



We note that while uttering if, the three items U S then S" must be put into temporary- 
storage. The word then, while belonging to the terminal vocabulary, cannot be uttered 
until after the word works has been uttered. We easily check that n < 4 throughout the 
entire process of generating the sentence (1). 

The situation is different with (2). 

n 



S if S then S 3 

-> if if S then S then S 5 

-> if if if S then S then S then S 7 

-»■ if if if AP VP then 5 then S then 5 8 

->■ if if if John VP then S then 5 then S 7 

-> if if if John works then 5 then 5 then 5 5 



If our psychological hypothesis is correct, a speaker will get stuck as soon as n > 8. 

One possible conclusion from this is that some grammatically possible sentences are 
never uttered, because they would overload the temporary storage capacity of the human 
brain. We shall drive home the point by considering another example. Let us add to the 
previous grammar the following productions: 

Adj ->■ nice, known, true 
S — ^ it is Adj that S 
S ->that S is Adj 

The following sentence causes no difficulty: 

(1) It is true that it is known that it is true that Jane works 
However, the following sentence is inconceivable: 

(2) That that that Jane works is true is known is nice 

Indeed, at one stage of its attempted generation we would have: 



82 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



n 

S ->■ that that that NP VP is Adj is Adj is Adj 8 

REMARK. We should point out the difference between "chunk of information" and 
"bit of information". The latter has a technical significance: bit = binary digit. If 
we think of a chunk as being a four letter word, each letter having 26 possibilities, say 
approximately 25, then 1 chunk is approximately 4 ■ 5 = 20 bits. 

Commentary by MB 

On the other hand, as the following example shows, an explanation in terms of the number of 
"chunks" to be stored is over-simplified. The following sentence is not a model of graceful style, 
but still can be processed, while in the course of processing the number of such chunks generated 
is as much as nine. The sentence uses yet another rule: 
S^-both S and S 
which generates such sentences as 

both John is late and Jane is early 

and the one that interests us 

If both that Jane works is true and that John works is false then Sue sleeps 



if S then S 

if both S and S then S 

if both that S is S and S then S 

if both that NP VP is S and S then S 



n 
3 
5 
7 



Consideration of this and sentences that are even worse from the point of view of chunks 

of information (using the production S >■ either S or S and S > neither S nor S) suggest 

quite a different interpretation of these facts; namely that there are specific processors in the 
brain that process such constructions and that these processors have no stack. Thus you cannot 
interrupt the if S then S processor for another construction of the same type, although there is no 
especial problem with using another type of complex construction that uses a different processor. 

Having formed some idea how sentences are produced, we are led to ask how strings 
of words are recognized as sentences. Let us take the same example as before, now from 
the point of view of the hearer: 

n 

the man took the ball and hit the cat 

< — Art man ... 1 
< — Art N . . . 2 



4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED? 



83 



■<— NP took ... 1 

< — NP V t the . . . 2 

-e— AP VJ Art ball ... 3 

< — NP V t Art N . . . 4 

-«— NPV t NP ... 3 

■<— AP VP and . . . 2 

AP VP Conj hit... 3 

NP VP Conj V t the . . . 4 

AP VP Conj VJ Art cat 5 

-«— AP VP Conj VJ Art AT 6 

-«— AP FP Conj Vi AP 5 

-<— AP VP Conj VP 4 

^ NP VP 2 

— 5 1 



A number of comments are in order: 

1. We have used the same grammar that had been used for producing the sentence. It 
might have been tempting to reverse the arrows, thus operating in the "dual" grammar 

with productions Art N > NP, and so on. However, this dual grammar is not 

context-free (or even generative, as it has more than one initial word.) 

2. After line 9 it would have been tempting to write <— S and • • •, but then we would 
have ended up with <— S Conj VP, and there is no production which will justify our 
replacing this by S. Presumably a human hearer makes use of the other cues, for 
instance intonation, to see that the sentence is not yet complete. 

3. The numeral on the right denotes the number of grammatical terms that are held in 
temporary storage at each stage. The number n = 6 in line 13 is dangerously close to 
the upper bound of 7. In fact, when we analyze the sentence 

the man took the ball and hit the cat and the dog 

we do get 

n 



AP VP Conj V t NP and 5 

AP VP Conj V t NP Conj the 6 

-«— AP VP Conj V t NP Conj Art dog 7 

-<— AP VP Conj V t NP Conj Art N 8 



84 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



What is puzzling is that there are 8 chunks of information in storage, and yet there 
is no apparent difficulty in someone's recognizing the above as a sentence. One expla- 
nation comes by replacing the production NP *- NP Conj NP by two productions: 

NP > NP* NP and NP* ^ NP Con]. Similarly we can replace VP > VP Conj 

VP by two productions. 

Exercises 

In each of the following exercises, count the chunks of information in storage at each stage. 

1. Show how a person might produce the sentence. 

the man hit the cat and the dog 

2. Show how a person might recognize the string 

the man took the ball and hit the cat and the dog 

as a sentence, utilizing the suggestion made in the last paragraph above. 

4.6 Machines for producing and recognizing sentences 

We shall construct a machine that can produce the sentences of a language which is 
generated by a normal context-free grammar where all productions have one of two 
forms: 

T > W T > UV 

The machine we have in mind is a variation of what is usually called a "pushdown au- 
tomaton" . It consists of two tapes: a storage tape capable of moving in both directions 
and an output tape allowed only to move to the left. There is a scanner which can read 
and write on both tapes. Both tapes are inscribed with words of the same vocabulary V, 
the total vocabulary of the grammar to be considered. 



storage tape 
scanner 

output tape 



We have shown the machine at the beginning of an operation: all squares but one are 
blank; only the scanned square of the storage tape is imprinted with the symbol S, 
representing the fact that the machine plans to utter a sentence. 



4.6. MACHINES FOR PRODUCING AND RECOGNIZING SENTENCES 



85 



The machine is non-deterministic: a given situation may give rise to several pos- 
sible moves; in fact, there is usually more than one way of completing a sentence. We 
now describe the moves of the machine, which depend on the given normal context-free 
grammar. 



(1) 



This means: if the scanned square of the output tape is blank and the scanned square of 
the storage tape has T written on it, the machine prints T on the scanned square of the 
output tape and erases it on the storage tape. 



(2) 



/ left s 
\stay y 



This means: If both scanned squares are blank, the storage tape moves one square to the 
left. 



(3) 



/ right N 

V sta y , 



This means: If both scanned squares have words written on them, the storage tape moves 
one square to the right. 



(4) 

provided T 
(5) 

provided T 
(6) 



W is a production of the grammar. 



UV is a production of the grammar. 




T 



/stay\ 
V left J 



provided T e S. Thus, if the scanned square of the output tape contains a word of the 
terminal vocabulary E, the output tape moves one square to the left. 

We shall illustrate the operation of the machine for the fragment of English discussed 
in Section 4.1, Example 3. Unfortunately, a production such as 



NP 



NP Conj NP 



does not belong into a normal grammar, so we replace it by two productions: 



NP > NP NP' 



86 CHAPTER 4. CONTEXT-FREE GRAMMARS 

NP' > Conj NP 

We shall describe in detail how our machine would utter the sentence: 

the man hit the cat and the ball 

according to the following parsing tree: 

S 

NP VP 



art N V t NP 

^ man hit — — 



Art N Conj NP 

the cat and ~T~ ~ 

Art N 

the ball 

In describing the successive situations of the machine, we shall only indicate what is 
written on the two scanned squares and on a few relevant squares to the right of the 
scanned square of the storage tape. In fact, all squares to the left on the scanned square 
of the storage tape are blank, and so are all squares to the right of the scanned square of 
the output tape. The squares of the output tape to the left of the scanned square contain 
nothing but past utterances. At no time need the machine remember what it has already 
uttered. Its memory ( = storage tape) need only contain the plan of what will be uttered 
in future. Well, here are the successive situations of the machine: 



OJ \S 

N VP 
the 

VP\ 
man / 

NP\ 
hit J 

N NP' 
Art 

N NP' 




( VP ) 
\NPJ 



VP\ 



VP 
NP 

N VP 







NP 


N NP' 
Art 

NP' 

N 



NP 




VP\ 




VP 



cat 



VP 



N 
Art 

N 




VP 



NP 

N NP' 
the 

NP' 



VP 



(If) 



NP' 
NP 



NP' 




N 
Art 

\P\ 

N 



v t 



NP 

N NP' 


NP' 



) 

NP 
NP' 



4.6. MACHINES FOR PRODUCING AND RECOGNIZING SENTENCES 



87 



/0\_ f NP \ _ f NP\ f NP\ _ (0 NP' 

{nP'J ~" \Conj) ~" \Conj ) ~" [and ) ~^ \0 

\ J ~" \NP J ~" \Art) ~" {Art J ~" [the J ~" \0 

(°) + ( ° ) + (° 
J \NJ \ba\\ J \0 

The machine does not come to a stop, but the storage tape moves to the left and keeps 
on doing so, until a nonempty square is scanned, or until we come to the end of the tape. 

It will have been noticed that the entire operation above required only three squares 
of the storage tape. One can easily check that the grammar of the fragment of English 
under consideration does not require more than a very small number of squares of the 
storage tape. For all practical purposes then, our machine is a finite automaton. 

Let us see what happens if we adjoin to our grammar the production 

S if S then S 



To make sure the grammar is normal, we must replace this by three productions 

S if S', 

S' S S", 

S" then S. 

As we try to generate sentences that begin with if, if if, if if if, etc., we see that more and 
more squares of the storage tape are required. It is a matter of experience that a human 
speaker will get stuck when he tries to utter a sentence starting with if if if. This shows 
that whatever corresponds to the storage tape in the human brain has in fact a quite 
small capacity. 

Surprisingly, the same machine can also be used for recognizing the sentences generated 
by a normal context-free grammar. By this we mean: if a terminal string is entered on 
the output tape (better called "input tape" now) to the right of the scanned square, the 
machine will erase this string and convert it to the initial symbol S on the scanned square 
of the storage tape, provided it adopts the right strategy. 

We adopt the following moves: 

« (o) - (li) 



(2') 



88 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



provided W 
(3') 



■T is a production of the grammar. 



T 

left 



(4') 

provided W 
(5') 



right 

W 



UV is a production of the grammar. 



V stay J 



The fourth move, for example, means: erase both scanned words, print W on the output 
tape, and move the storage tape one square to the right. 

' ^ the machine may find itself in a quandary. Sup- 



In the situations 



T 



and 



V 



pose, for example, the machine is in the situation ( ). When there is no production 



U 



W > UV, only move (5') is applicable. When there is a production W > UV, we 

prefer to make move (4'), but it may be a bad move, as we shall see in an example. 

We assume the sentence the man hit the cat and the ball is entered on the input tape, 
to the right of the scanned square. We shall not show what is written on the input tape, 
except on the scanned square. Here are successive situations of the machine: 







NP 

NP 



V 



NP V 





the 

o 

the 

Art 
cat 



^NP 
NP 





Art, 



hit 

V t 
Art 

NP \) 



Art 
man 

NP\ 

V ) ~* 

NP V t 



Art\ 
N J 



NP 



Art 



NP 




Art 

V 
NP 




man 



V t 



( Art\ 
\ N 



NP 



NP V t 



V 
the 

Art\ 
cat ) 



At this stage we are tempted to make move (4'), but we shall make move (5') instead (we 
abbreviate Conj by C): 



(NP V t 




NP 



NP V NP 




and 



NP V t NP \ 
C 



NP V t NP 
and 

NP V t NP 0' 

C. 



4.7. DERIVATIONS AND PARSING TREES 



89 



(NP V t NP C \ (NP V t NP C \ / NP V t NP C 
V the/ ^ V the/ ^ V Art 



NP V t NP C \ (NP V t NP C Art 



Art J V ball J 

NP V NP C Art \ _ f NP V t NP C Art 

ball J ' I N 

NP V NP C \ f NP V NP\ f NP V \ ( NP \ f 

NP ) ~* \ NP 1 ) ^ \ NP ) ~^ \ VP ) ~^ \S 

S 


What would have happened if we have followed our first inclination and made move 
(4') instead of (5')? 

NP\ _ f0\ _ ( S \ fS \ _ fS 

VP J ~" [Sj ~" land J ~" { and J [c 

S 0\ (S C \ (S C \ _ (S C\ /SCO 

C) ~" \ thej ^ I thej ~" \ Art) ^ { Art 

S C Art\ fS C Art \ fS C Art\ fSC 



ball 7 V ball 7 V N J \ NP 

( S \ fS 0\ (S NP' 



\NP'J V NP' J \ 
and there's nowhere to go from here. 

Exercise 

1. Show how a machine would (a) produce, (b) recognize the sentence <t 2 t 2 in the 
language L\ of Section 4.2, Example 4.2.2. (Hint: First replace the original context-free 
grammar of C\ by a normal one. 

4.7 Derivations and parsing trees 

For many purposes, proofs in tree form are better replaced by "derivations" . A derivation 
of C > D is a finite sequence of strings 

C — Co, Ci , . . . , C n — D 

where 

Ci = EiAiFi, Ci + \ = E{BiFi 



90 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



and Ai is a production for each i — 0, 1, . . ., n — 1. We say that this derivation 

has n steps. 

By a derivation of steps we mean an application of the reflexive law: C ^C = D. 

A derivation of 1 step has the form 

C = EAF, EBF = D 

where A > B is a production. Note that C > D can then be proved by the substitu- 
tion rule (S'). We shall now give an example of a 2-step derivation. Suppose A > B 

and Ai s- B\ are productions and C = EqAqG AiFi, D = EqBqGBiFi. Then we have 

the derivation 

C = Co = E A (GA 1 F 1 ) - d = E B (GA 1 F 1 ) 

F F 

= (E B G) A 1 F 1 > C 2 = (E BqG) BiFi 

^ ' „ ' 

In this example it is clear that we can prove C > D as follows: 

A ^ B A 1 ^ Bi 

~ (S') (S') 

EqAqGA\Fi >■ EqBqG A\Fi EqBqGAiFi > EqBqGB\Fi 

(T) 

EqAqG A\Fi > EqBqGB\F\ 

Quite generally, if C > D has a derivation, then it also has a proof in tree form. Indeed, 

in view of reflexivity and transitivity, we need only recall that this so for 1 step derivations. 
We have thus established one half of the following: 

4.7.1 Theorem A formula C > D is provable in the deductive system T>(Q) if and 

only if it has a derivation. 

It remains to prove the implication the other way. Suppose C' >■ D' is provable, 

that is, possesses a proof in tree form. We shall show by induction on the length of this 

proof that there is a derivation of C' > D' . If C' = D', we have an instance of the 

reflexive law, hence a step derivation. 

If C' s- D' is a production, there is a one step derivation. 

Suppose C' > D' is obtained by substitution thus: 

C *-D 

(S') 

C' = ECF ^ EDF = D' 

By inductional assumption, C >■ D has a derivation, say: 



C — Cq, Ci, . . . , C n — D 



4.7. DERIVATIONS AND PARSING TREES 



91 



Then 

C = EC F, EC\F, ... , EC n F = D' 

is also a derivation, as is easily checked. 

Finally, suppose C > D' is obtained by the transitive law thus: 

C > E E > D' 

— ; ~ (T) 

Juxtaposing derivations of C >■ E and E >■ D', which may be assumed to exist by 

inductional assumption, we obtain a derivation of C > D' . m 

As an immediate consequence of the theorem we have the following: 

4.7.2 Corollary IfQ = (V,^,!!' ,V) is a grammar, then the language C(Q) consists 
of all strings E G S* for which there exists a derivation S >E for some S G £'. 

Note that in generative grammars, there is a unique S G £'. 

We may now return to Example 4.2.2 of Section 4.2. We wish to determine the 
language £(Gi), where Q\ is the grammar with productions 

S ><rr S >aSr 

Thus we seek all possible derivations S E with E G {a, t}*. There cannot be a 

step derivation of S >E, since S is not an element of S*. If there is a 1 step derivation, 

it must have the form 

S = Co = EqAqFo, C\ = EqBqFq = E 

where A ^B is a production. Clearly, A = S, B = ar and E — F — 1. If there is 

a 2 step derivation S = C , C 1: C 2 = E, we require that 

Co = E A F , C\ = EqBqFo = EiAiFi, Ci = E\B\F\ 

where A ^B and A\ *-B\ are productions. Clearly, then A = A 1 = S, hence E Q = 

F = 1, and B , B\ must be err or aSr. Now B = C\ = EiAiFi = EiSFi, hence B = 
aSr and E x — a, F ± — r. Thus C 2 = crBiT = E G {er, r}*, and so Bi = or. Therefore, 
E = C 2 = oott = a 2 T 2 . 

Continuing in the same way, we see that, if S > E has an n step derivation, then 

E = a n r n . If properly done, this proof should be by induction on n. 

We shall now return to the notion of "parsing tree" , which was informally introduced 
in Section 4.1. While proofs in tree form and derivations are valid techniques for all 
grammars, parsing trees are essentially confined to context-free grammars. (Their use has 
been extended to context sensitive grammars, but these will not be discussed here.) 



92 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



In a context-free grammar, each production has the form 

A >A x A 2 ...A n 

where A, A\, A2, ■ ■ ., A n are words and n > 1. Take for instance the case n — 3, then we 
associate with this production the following picture: 



A 1 



A 



Now suppose we are interested in a derivation A > E. Then either the string E starts 

with A\ or else, at some stage of the derivation, Ai is replaced with the help of a production 

M > A U A 12 . . . A lni 

Take, for instance, the case n± = 2, then we enlarge the picture to 

A 




Continuing in this fashion, we ultimately obtain a tree where all the words occurring in 
the string E appear at the end of the branches. Usually we are interested in the situation 
A = S, E G S*. The resulting tree is called a parsing tree of E. 

For example, let us look at the grammar Q = (V, {a, r}, {S}, V) with vocabulary V = 

{a, r, S,U,V} and productions V : U a, U aU, V r , V rV, S UV. 

We shall show that a 2 r is a sentence generated by this grammar 

(a) by giving a proof in tree form of the formula S - 



(b) by presenting a derivation, 



(c) by exhibiting a parsing tree. 



4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE^ 



(a) 



TT (P1) 

U ^cr 



U^UaU 



all ^aa 



U ^aa 



(S') 
(T) 



T/ (P3) 
V 



uv 



■ (JUT 



■ GOT 



(S) 
(T) 



(b) 



S, UV, aUV, aUr, got 



(c) 




Exercise 

1. Show that a 3 r 2 e C{Q ) 

a. by a proof in tree form, 

b. by a derivation, 

c. by a parsing tree. 



4.8 A necessary condition for a language to be context-free 

We asserted in Section 4.4 that the intersection of two context-free languages need not be 
context-free. Indeed, let us consider the terminal vocabulary E = {a, t} and the languages 



£ 3 = { a m T m a p | m > 1, p > 1} 



94 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



£ 4 = {aW | q > 1, n > 1} 

We shall see that £ 3 and £ 4 have context-free grammars but that £ 3 fl £ 4 does not. 
First, observe that £ 3 = £i£s, where 

£i = | (7 m r m 

has already been met in Section 4.2, and 

£ 5 = K I p > i} 

has the obvious grammar: S5 ><7, S5 ^crS^. Therefore, £3 is the product of two 

context-free languages and thus also context-free, by Theorem 4.4.1. Now £ 4 is clearly 
the mirror-image of £ 3 , hence also context-free, by the same theorem. 
Now it is clear that 

£ 3 n£ 4 = W k r k a k \k>l} 

We shall see that this language is not context-free. To show this, we require a result by 
Bar-Hillel, Perles and Shamir. 

4.8.1 Proposition // £ C S* is an infinite language with a context-free grammar, 
then there exist strings A, B, C, D, E E S* such that AB n CD n E E £ for all n > 0, 
where B and D are not both empty. 

For, example, in the fragment of English discussed in Section 4.1, we might take 
A = the man hit the cat 
B = and the man hit the cat 
C = and 

D = the man hit the cat and 
E = the man hit the cat 

Then ABODE, ABBCDDE, ABBBCDDDE, etc. are all grammatical sentences. 

We shall not prove this result now, but apply it to the discussion of C = £3 fl £4. 

Let us suppose that £ = £3 fl £ 4 is a context-free language. Then, by the proposition, 
there exist strings A, B, C, D, E such that F n = AB n CD n E e £ for all n > 0, and B 
and D are not both empty. In particular, F 1 = ABCDE = a k T k cr k , F 2 = ABBCDDE = 
o 1 t 1 o 1 , for certain k, I > 1. Moreover, since B and D are not both empty, / is strictly 
larger than k. 

Now since A is an initial segment of F, we must have either that A = o~ p for some p < k 
or that A begins with the string a k r. But in the latter case, it follows that F 2 = AB ■ ■ ■ 
also begins with the string a k r which is impossible since F 2 = o l ■ ■ ■ with / > k. Therefore 
A = a p ,p < k, and, by the mirror image argument, E = a q , with q < I. We then get 

BCD = a k - p r k a k - q 



4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE^ 



from Fi, and 

BBCDD = a l - p r l a l - q 
from F 2 . If B — a ■ • • r ■ • •, we would obtain 

BBCDD =a---T---a---T--- 

contradicting the above. Therefore, B is a power of a or a power of r. Suppose B = t 1 , 
% > 1, then the formulas for BCD and BBCDD tell us that — p = and Z — p — 0, 
again leading to the contradiction that k = I. Therefore, B = a 1 , and, similarly, D = aK 
Thus, from the formula for BCD, we obtain 

C = a k ^p~ i T k a - k -i^j 

and, from the formula for BBCDD, we obtain 

C = a l -P- 2i T l a l -i- 2 j 

Thus, again we are led to the contradiction that k — I. 

We have now exhausted all the possible cases, so that £ 3 n £4 cannot be a context-free 
language. 

We shall prove Proposition 4.8.1 after some discussion. 

Let us recall Example 4.1.1 of Section 4.1. The 72 sentences of that language may be 
summarized by the formula: 

S >(the V a)(man V ball V cat) (hit V took) (the V a)(man V ball V cat) 

Indeed, if we regard multiplication as being distributive over V, we obtain: 

S >■ (the man hit the cat) V (the man hit the ball) V • • • 

We have seen how adding the single word and will make the language infinite, provided we 

add only the following production: S S and S. Or we could add the single word poor 

and the production iV >■ poor N, and again we would get infinitely many sentences of 

the form 

S > the (poor)™ man hit the cat, n = 0,1,2,. . . 

In both cases we have a production of the form T > UTV , where T belongs to the 

auxiliary vocabulary and U and V are not both empty. In the first case: T = S, U — 1, 
V = and S; in the second case: T = N, U = poor, V — 1. (Recall that 1 = the empty 
string.) This idea will be used in the proof of the proposition. 

First, we shall establish a lemma. 



96 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



4.8.2 Lemma Let C be a language with a context-free grammar Q . Then, without loss 

in generality, we may assume that, in any production A > B, either B is a terminal 

word or it is a string containing at least two words. 

Proof For the purpose of this proof, let us call the nonterminal word B "bad" if it 
occurs on the right hand side of a production of Q. Let 

A 1 > B, A 2 ^B, A m 

be a complete list of these productions. We also make a list of all productions which 
contain the bad word on the left hand side: 

B >d, B >C 2 , B >C„ 

where the Cj are strings. We may also assume that all Cj ^ B, as B ^B is an instance 

of the reflexive law and need not be given as a production. 

We now form a new grammar Q' . This differs from Q in only one respect: the m pro- 
ductions Ai > B and the n productions B > Cj are replaced by the mn productions 

A ■([,. 

We note first of all that Q' has one less bad word than Q. For none of the Cj is B, 
and if Cj is bad in Q', that is, a non terminal word, then it was already bad in Q. 

We note secondly that Q' generates the same language C as does Q. For suppose 

the production Ai ^Cj is used in showing that S is a theorem in V{Q'), for 

some terminal string D, then Ai > B and B > Cj may be used to establish the same 

theorem in V{Q). Conversely, suppose the production Ai ^B is used in showing that 

S is a theorem in T>{Q), D being a terminal string, then the parsing tree of D 

must contain as a part: 

A 



B 




Ei Ei • • • Ek 

where E 1 E 2 ■ ■ ■ E k = Cj for some j. Replacing this part by 

Ai 




Ei E 2 ■ ■ ■ E k 



4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE 9 7 



we obtain a parsing tree for D in Q', which utilizes the production Ai > Cj in place of 

Ai > B. 

Thus we have replaced the grammar Q by another grammar Q' which generates the 
same language as Q, but which contains one bad word less. Repeating this process, we 
ultimately get rid of all bad words without changing the language generated. ■ 

We are finally able to give a proof of Proposition 4.8.1. By the depth of a parsing 
tree, we mean the length of the longest downward path you can follow along the tree. It 
corresponds to the number of steps in a production. 

Proof Let C be an infinite language with a context-free grammar that satisfies the 
conclusions of Lemma 4.8.2. In what follows, let 

m be the maximal length of all strings B which may appear on the right hand side 
of a production A > B and 

p be the total number of words in the vocabulary V. 
We will now bound the number of parsing trees of depth at most I. The number 
of trees of depth is 2 (including the empty tree) and the number of trees of depth at 
most 1 is m + 2, since the only such trees are the 2 of depth and the ones of depth 
1 with the top node and a A;-fold branch with 1 < k < m. Let f(l) denote the number 
of trees of depth at most I. Then from the fact that a tree of depth at most / + 1 can 
be gotten (usually in more than one way) by replacing each of the bottom nodes of one 
of the m + 2 trees of depth at most one by a tree of depth at most / and there are at 
most m such bottom nodes, the number of possibilities is at most f{l) m . In particular, 
we have that f{2) < (m + 2) m , which gives /(3) < f{2) m < ((m + 2) m ) m = (m + 2) 2m , 
f(l)<(m + 2) l -\ 

A parsing tree is gotten by replacing each node in a tree by a word in the language. 
Since each node of a parsing tree may be occupied by at most p words, the total number 
of parsing trees of depth / cannot exceed p/(0<( m + 2 ) i 1 - hence there cannot be more than 
that number of sentences in the language C. 

Given that the number of sentences in C is infinite, it follows that there must be 
parsing trees of arbitrary depth. Therefore, there exist parsing trees with branches of 
arbitrary length. In particular, there will exist a parsing tree with a path of length q > p. 
For convenience, we shall picture this path horizontally: 

S = Cq — C\ — C*2 — • • • — Cq-i G S 

a path in the tree with each Ci G E. Since the total number of words in the vocabulary 
is p < q, the words appearing in this branch cannot all be different; say Ci = Cj, where 
i< j <q-l. 

Now for each k such that % < k < j there is a production 

Ck *• B k Ck+iD k 



98 



CHAPTER 4. CONTEXT-FREE GRAMMARS 



with Bk, Dk G E*. Since we have supposed that Q satisfies the conclusions of Lemma 4.8.2 
it follows that at least one of Bk or Dk is nonempty. In particular, either Bi or Di is non- 
empty. We can therefore deduce a (j — i)-step production 

d > B'CjD' 

where B' = BiB i+1 . . . Bj_i and D' = -Dj-i . . . A, and at least one of B' and D' is non- 
empty. Putting C = Ci = Cj, we obtain a theorem 

C ^B'C'D' (1) 

Similarly we may obtain a theorem 

S >A'C'E' (2) 

where A' and E' are strings that might be empty. From (1) and (2) we deduce 

S - A'B'C'D'E' > A'B'B'C'D'D'E' 

and so on, in fact, 

S *A'(B') n C'(D') n E' (3) 

for all n > 0. 

We have almost proved the proposition, except that there is no reason for A', B', C, 
D' and E' to be terminal strings. It remains to be shown that they can be replaced by 
terminal strings in (3). 

Take for instance B' = BiBi + \ . . . Bj-i. Now Bi occurs in a parsing tree, hence there 
is a theorem 

Bi > @i@2 ■ ■ ■ Put 

in our deductive system, where the (3 r are the terminal words appearing at endpoints of 
branches emanating from Bi. Applying the same reasoning to B i+i , . . ., Bj, we obtain a 
theorem 

B' >PiP 2 '~Pn = B 

Thus there are theorems 

A' >A, B' *B, C >C, D' >D, E' *E (4) 

where A, B, C, D and E are terminal strings. Since our grammar is context-free, B 
contains at least as many words as B' and D at least as many as D'. Since B' or D' is 
known to be nonempty, it follows that B or D is nonempty. Finally, from (3) and (4), we 
obtain the theorem 

S >AB n CD n E 

for all n > 0, and our proof is complete. ■ 



Chapter 5 



Generative and transformational 



grammars 



5.1 Inflection rules conjugation 

Traditional grammar distinguishes between syntax and morphology, the former dealing 
with the formation of sentences from words, while the latter deals with the formation of 
words themselves. By now the reader should be convinced that context-free grammars 
constitute a useful tool for dealing with syntax. Let us now look at such rules as: 

Past work -> worked 
Plur man > men 
Part be being 

If we count Past, Plur and Part among the words of the auxiliary vocabulary, these 
rules clearly do not fit into a context-free grammar, or even a context-sensitive gram- 
mar, as the left hand side of these productions contains more words than the right hand 
side. Unfortunately, English is still an inflected language, and any grammar of English 
must incorporate such inflection rules, or "morphographemic" rules, as it may be more 
fashionable to call them. 

Let us begin by looking at the inflected forms of the English verb. We note that for 
each verb V there are 6 forms Cki{V) where k — 1, 2 refers to the tenses present and past, 
and % — 1, 2, 3 refers to the three persons usually associated with the pronouns "I", "you" 
and "he" or "she" . These 6 forms are conveniently arranged in the form of a matrix: 



C(V) = ( 



C U {V) C 12 (V) C 13 (V) 
C 21 (V) C 22 (V) C 23 (V) 



) 



Here are some examples 



99 



100 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



{ work work works \ 

V worked worked worked / 
/ go go goes\ 
\went went gone/ 
/ can can can \ 

V could could could / 
/ must must must\ 
\must must must/ 
/ am are is \ 
\was were was/ 

A few comments are in order. 

1. English verbs are usually presented in the infinitive, except modal auxiliaries such as 
can and must, which do not have an infinitive. 

2. We have mentioned two simple tenses: present and past. Compound tenses will be 
dealt with in the next section. There are two more simple tenses in English: the present 
and past subjunctive. However, these occur so rarely that we have chosen to ignore 
them here. 

3. We have mentioned three persons of the singular. What about the three persons of 
the plural? We may safely identify these with the second person of the singular at this 
stage of our investigation, as the verb forms of the plural always coincide with those of 
the second person singular. 

4. The conjugation matrix for work is regular and serves as a model for most English 
verbs. Any departure from this model ought to be listed in the dictionary. 

5. Actually, at most five of the six forms mentioned are distinct, as always C 2 i(V) = 
C23(V). The maximum, diversity is achieved for V = be, and for V = must all six 
forms coincide. 

After this preliminary discussion we are able to state productions which will generate 
some elementary English sentences, due attention being paid to the morphology. 

(1) S ^Subj Pred 

(2) Subj ^NPiP,, (2 = 1,2,3) 

(3) NP 1 > I 



C(work) = 
C(go) = 
C(can) = 

C(must) = 
C(be) = 



5.1. INFLECTION RULES CONJUGATION 



101 



NP 2 ^you, we, they 

NP 3 > he, she, it, one, Name 

Name *-John, Jane,... 

(4) Pred > Tense VP 

(5) Tense T 2 

(6) vp — >y intr , v tr obj,... 

(7) V intr > work, rest, go, . . . 

Vt r > like, call, hit, take, . . . 

(8) C% > Acc iVP^ = 1,2,3) 

(9) PiT fe y >C w (y), (i = 1,2,3,*; = 1,2), for any verb V 

(10) Acc I >me 

Acc he > him 

Acc she > her 

Acc we > us 

Acc they *-them 

Acc X >X otherwise 

The auxiliary vocabulary consists, aside from Name and Tense, of the following ab- 
breviations: 



s 


= statement 


Subj 


= subject 


Pred 


= predicate 


NPi 


= ith person noun phrase 


Pi 


= ith person 


7i 


= present 


T 2 


= past 


VP 


= verb phrase 


^intr 


= intransitive verb 




= transitive verb 


Obj 


= object 


Acc 


= accusative case 



For the moment, V is called a "verb" if there is a production Vintr *- V or V tr > V. 

Other verbs will be discussed later. The distinction between transitive and intransitive 
verbs is complicated by the fact that some intransitive verbs can be used transitively, as in 
I work him hard, and some transitive verbs permit object deletion, as in she called today, 
but we shall ignore these complications here. 



102 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



We note that productions (1) to (8) satisfy the definition of a context-free grammar, 
only productions (9) and (10) do not. For example, let us see how we can generate 
the sentence she works. Staying within the context-free part of the grammar, we may 
construct the following parsing tree: 



Subj 



Pred 



NP* 



she 



Tense VP 



Ti V ±1 



work 

This parsing tree yields only the pseudo-sentence: 

she P 3 Ti work 
To convert this into a terminal string, we apply rule (9): 

P 3 Ti work 5-Ci 3 (work) = works 

Here is another example: 

S 



Subj 



Pred 



NP 1 Pi 



Tense 



liked 



V 



tr 



like 



VP 



Obj 

/ \ 

Acc NP 3 
she 



The inflection rules used here are: 

Pi T 2 like >C 21 (like) = liked, Acc she 



her 



her 



5.2. MORE ABOUT TENSE AND VERB PHRASE 



103 



Exercises 

1. Consider the following sentences: 

a. I like her 

b. she likes me 

c. we worked 

d. John took it 

Construct the underlying parsing trees and show which productions yield the terminal 
strings. 

2. Suppose we add to the above productions the following: 

NP 2 ^NPi and NPj 

Show that the following four strings then become sentences.: 

a. John and I work 

b. I and John work 

c. she likes me and John 

d. she likes John and I 

3. Suppose, in place of the production in Exercise 2, we postulate the following two 
productions: 

Subj ^NPi and NPj P 2 

Obj ^Acc NPi and Acc NPj 

Show that (a), (b) and (c) in Exercise 2 remain sentences, but that (d) will be replaced 
by: 

(d') she likes John and me 

5.2 More about tense and verb phrase 

While Section 5.1 has served to illustrate some general principles, only the simplest sen- 
tences can be generated by the productions considered so far. In particular, we have only 
looked at two simple tenses: present and past. However, even aside from the subjunctive, 
which we have chosen to ignore, there are lots of compound tenses. We shall now amplify 
production (5) of Section 5.1: 

(5.1) Tense *-Tj Mod As Pl Asp2, (i = 1, 2) 

(5.2) Mod >1, Kod 

(5.3) Asp 1 >1, have Perf 

(5.4) Asp 2 >1, be Part 



104 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



(5.1) to (5.4) are usually summarized as follows: 

Tense >Tj (Kod)(have Perf)(be Part) 

We require the following further productions: 

(5.5) Kod *-must, may, can, shall, will 

(5.6) Part work > working 

Part take > taking 

Perf work worked 

Perf take >■ taken 

Perf be > been 

We have used a number of new auxiliary words: 

Mod = modality 
Kod = modal auxiliary verb 
Asp i = aspect of type i, (i = 1, 2) 
Perf = past participle (= perfect) 
Part = present participle 

(The word "perfect" is borrowed from Latin grammar, where it stands for completed 
action. Curiously, in English it is the simple past that is perfective in that sense, while it 
is the compound past, whose meaning is generally imperfective that is called the perfect. 
This illustrates the follity of attempting to impose the grammar of one language on 
another.) 

The class of verbs has now been enlarged by V = have and V = be, and by such V for 

which there are productions Vmod **V\ It should be pointed out that some verbs, such 

as like, are not supposed to admit a progressive form, although one hears such sentences 
as how are you liking your new dress? 



5.2. MORE ABOUT TENSE AND VERB PHRASE 



105 




it 

This parsing tree yields a pseudo-sentence, which is converted into a terminal string 
as follows: 

she P3 Ti have Perf be Part take Acc it 
has been taking it 

Now let us turn attention to the "verb phrase" introduced in Section 5.1. (It should 
be pointed out that we have used this term more narrowly than other authors, whose VP 
comprises tense and person.) The following expansion of rule (6) of Section 5.1 gives only 
a very partial account of the possible verb phrase constructions in English: 

Obj Obj, 



(6) 


v P 


-»-Kntr, Kr Obj, V cop 


Adj, V g 






Vg ive Obj to Obj, V make 


Obj Adj 


(6.1) 


Vop 


->■ be, stay, seem, look, . . . 




(6.2) 


Adj 


poor, old, good, green, . . . 




(6.3) 


Vnake 


make, render, leave, . . . 




(6.4) 


^give 


-^►give, teach, tell, show, . . . 





Here 

Vcop = copula 
Adj = adjective 
Kake = verb like "make" 
Vgi V e = verb like "give" 

The class of verbs has been enlarged, and we might summarize all the verbs introduced 
hitherto by a production: 



Verb 



V intr , V tr , have, be, V^od, K op , V ma k e , Vgj ve , 



106 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



There is some duplication in this list, as for example: V tr > have, Kake and V^ ive seem 

to be causally related to V cop and V tr . For example, make means cause to be, give means 
cause to have, teach means cause to know, etc. 

As usual, we have neglected to specify what types of objects can be taken by these 
verbs. For example, in 

X teaches Y to Z 

which is equivalent to 

X teaches Z Y 

Z normally denotes a person or an animal, and Y denotes something but not somebody, 
while in 

Z knows Y 

Y may denote somebody as well as something. Many grammarians try to incorporate 
such restrictions into the grammar. However, it is difficult to be consistent. One author 
recently asserted that the subject of eat must be animate. Unfortunately, he went on to 
argue his case by saying that stones do not eat, thus inadvertently using the construction 
which he claimed is ungrammatical. 

Here is one illustration of the expanded version of (6): 



S 




NP 3 P 3 Tense VP 




she Ti V cop Adj 

look good 
• ^ / 

looks 

Here we did not make use of the expanded version of (5), or we should have replaced 
the Tense part of the parsing tree by: 



Tense 




Ti Mod Asp 1 Asp 2 



5.3. MORE ABOUT THE NOUN PHRASE 



107 



Exercises 

1. For the following sentences construct the underlying parsing trees and show how the 
terminal strings are derived: 

a. he may have hit her 

b. she had hit him 

c. she would hit him 

d. I was resting 

e. you should have been working 

2. Suppose you amplify (5.4) by introducing the productions: 

Asp 2 Vp r ogPart 

Vpro S be, start, stop, keep 

Show that the following are sentences: 

a. he has stopped hitting her 

b. she kept calling him 

3. Construct the underlying parsing trees of the following: 

a. they left her poor 

b. I teach it to her 

c. she could have shown me it 

5.3 More about the noun phrase 

The only noun phrases we had admitted in Section 5.1 were names and pronouns. This 
is of course inadequate: we must also consider noun phrases built up from nouns. The 
problem is that some nouns must and other nouns may be preceded by an article or 
another determiner. 

Aside from pronouns, and names, there are in fact several different kinds of nouns: 

1. Count nouns (= iV count ) such as man, cat, ball, etc. 

They require an article or another determiner and admit plural formation. 

2. Plurals (= N vlUT ) may be plurals of count nouns, e.g. men, cats, balls, etc., or may not 
possess a singular at all, e.g. scissors. They admit the definite article the, but not the 
indefinite article a(n). 

3. Substance nouns (= iV subst ) such as milk, soap, bread, etc. They admit the definite but 
not the indefinite article. 



108 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



If a short digression is permitted, we shall make some remarks of a historical and 
philosophical nature on the difference between count nouns and names of substances. 

One can say one man, two cats, three balls, etc., but one must say one glass of milk, 
two cakes of soap, three loaves of bread, etc. Some similar linguistic phenomenon in an- 
cient Greek presumably induced Aristotle to develop his theory of substance and form. 
The idea is simply this: when talking about a loaf of bread, bread denotes the substance 
and loaf the form. Other ancient philosophers were led to ask: how many substances are 
there in nature? The traditional answer was: four, namely, earth, water, air and fire; but 
each of these had its champions who proclaimed it to be the only substance. Then there 
was the theory that substance=mat(t)er is female, while the form is male, but this takes 
us too far afield. 

Whether something is denoted by a count noun or is regarded as a substance is often 
a question of size. The English language decrees that beans can be counted, but that 
rice cannot, unless in the form of grains. At one time peas were regarded too small for 
counting and there was an old substances noun pease, as in pease porridge. However, later 
peas(e) was regarded as the plural of the newly coined word pea. Perhaps some day rice 
will be considered as the plural of rouse. 

Articles are not the only kind of determiners. In fact we shall distinguish several types 

of determiners: 

-Dcount = determiners which may be used for count nouns 

-Dsubst = determiners which may be used for substance nouns 

-Dpiur = plural determiners, which may be used for plurals 

-Dsing = singular determiners, which may be used for count and 

substance nouns 

-Duniv = universal determiners, which may be used for count 
nouns, substance nouns and plurals. 
Rule (3) of Section 5.1 should now be supplemented by the following new productions: 

(3.1) NP 2 >£> plur N plur 

(3.2) NP 3 ^£>count Account, ^subst N subst , somebody, something... 

(3.3) -Dpiur ^Amiv, these, those, 1 

(3.4) -Duniv "the, my, his, her, your, some, no, ... 

(3.5) .Dcount "^sing, a(n), one, each, every 

(3.6) Using "Amiv, this, that 

(3.7) -Dsubst "-Dsing, 1 

(3.5) must be supplemented by the following rule: 

an X if X starts with a vowel 
a X if X starts with a consonant 

However, (3.8) refers to the pronunciation of X and not its spelling. 



(3.8) a(n)X "j 



5.3. MORE ABOUT THE NOUN PHRASE 



109 



In the following examples, we make use of the grammar of Section 5.1, but not of the 
grammar of Section 5.2. 




Subj 



Pred 



Tense 



VP 



D 



count 



N, 



count 



tr 



Obj 



D 



sing 



Dr, 



cat 



likes 



like Acc 



D 



subst 



N, 



subst 



your 



milk 



milk 



While we have distinguished between substance nouns and count nouns, we have not 
distinguished between masculine and feminine, animate and inanimate, etc. Thus we 
cannot rule out questionable sentences such as milk likes your cat. 

There are many other questions that we have left open. Where do the numerals and 
the word all belong? What about your two cats, all her cats, etc.? 



110 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



Exercises 

1. Utilizing the grammar of Sections 5.1 and 5.3, construct the underlying parsing trees 
for the following sentences and show how the terminal strings are derived: 

a. each man works 

b. the man hit my cat 

2. Utilizing also the grammar of Section 5.2, do the same for 

a. your cat was resting 

3. What additional productions are required to generate the following sentences: 

a. this is good 

b. I like that 



5.4 The passive construction and transformational grammar 

We shall discuss the English passive construction, first within generative grammar, and 
then from the point of view of transformational grammar, it being our intention to intro- 
duce the latter concept in this section. 
Everyone knows that the sentences 

{takes 
took the ball 
will take 

give rise to the passive forms 

!is taken 
was taken by the man 
will be taken 

the propositional phrase by the man being optional. It is clear from these examples that 
the passive of a sentence depends on its structure, that is, on its underlying parsing tree. 

(1) S >NPi Pi Tense V tI Acc NPj 

where i, j — 1, 2, 3, we may form quite generally 

(2) S >NPj Pj Tense be Perf V tI (by Acc NPi) 

Terminal strings derived from (2) may be regarded as passives of terminal strings 
derived from (1). From the viewpoint of generative grammar, we are not interested in the 
relationship between these two terminal strings, but only in the grammaticalness of the 
former. 



5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR 



111 



The easiest way to assure that (2) and all consequences of (2) are theorems in the 
deductive system T>{Q) is to postulate the following new productions, augmenting rule (6) 
of Section 5.2: 

(6.5) VP -> be Perf V tr Ag 
Ag by Obj 

where 

Ag = agent 

Here is an example of a passive construction based on (6.5): 

S. 



NP 




had been being hit 

(Stylistically the resulting sentence may be improved if we replace being by getting, as in 

Exercise 2 at the end of this section.) 

While (6.5) only deals with transitive verbs, verbs like make and give also admit 
passives, for example 

he was rendered poor by her 
she is shown it 

it is shown to her 

We are therefore led to supplement (6.5) as follows (see Section 5.2 for the verbs 
involved) : 

VP be Perf V make Adj Ag 
(6.6) -^be Perf V s±ve Obj Ag 

-> be Perf V give to Obj Ag 

One could go on in this way, but perhaps a more radical approach is called for, which 
will account for all these rules at once. Here is one suggestion: 

VP ^-Pass VP 
(6.5*) Pass V^> be Perf V * 

Obj 1 



112 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



Here Pass = passive, \1/ is an auxiliary symbol, and V is assumed to be any verb which 
possesses a passive. This last proviso could be used to prevent us from forming the passive 
of the following sentences: 

the play lasts three hours 
this colour becomes you 

he went the whole way 

This approach could be extended to account for sentences such as 

a new idea is called for 

but we shall not pursue it any further. 

We have discussed the construction of passive sentences within the framework of gener- 
ative grammar. However, this approach does not explain what sentence a passive sentence 
is the passive of. It does not explain why it is that in 

Y is eaten by X 

X normally denotes an animal or a person just as in 

X eats Y 

It does not explain why in both sentences Y may be soup in Canada and tea in England. 

For these and other reasons, Chomsky has attempted to view the passive construction 
as a "transformation". According to this view, a transformation transforms one pars- 
ing tree into another. Thus, the passive transformation corresponding to (6.5) would 
transform the parsing tree 



S 




NPi NPi Tense VP 




X T Kr Obj 




V Acc NPj 
Y 



5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR 



113 



into the parsing tree 



S 




NP; Pi Tense VP 




X T be Perf V tI Ag 




V by Obj 




Acc NPi 

X 

Here X and Y are terminal strings, V is a terminal word and T is a string made up of 
terminal words and inflections (Tk, Perf and Part). The branches to those strings are 
really abbreviations. Thus 

T 

each man 

stands for 



NPj 




-'-'count 1 v count 

each man 

How can we interpret this process in more familiar mathematical terms? One way, 
which is close to Chomsky's original view, would be to regard it as a rule of inference as 
follows: 

S >X PjT V Acc Y, NPj >X, Tense >-T, V tI *V, NPj *-Y 

S > Y Pj T be Perf V (by Acc X) 

A rule of inference in a deductive system is a rule that says that if you have derived 
a certain set of strings, then you may derive another string. The rules (S) and (T) of 
Section 4.2 are rules of inference. 



114 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



In the present instance the new rule says that if there are productions that derive the 
string X Pi T V Acc Y from S, the string X from NPi, the string T from Tense, the 

string V tT > V from V (which means that V is a transitive verb) and the string Y from 

NPj, then derive the string Y Pj T be Perf V (by Acc X) from S. 

Recall that a production A > £> of the generative grammar £ was regarded as an 

axiom in the deductive system T>(Q) in Chapter 4. We may now define a transfor- 
mational grammar Q' as stipulating not only axioms (productions) but also rules of 
inference ( = grammatical transformations) for a deductive system T>{Q'). We replaced 
(6.5) by a rule of inference, and we can do the same for each of the productions (6.6). 
Once is tempted to combine all the resulting rules of inference into one, as follows: 

S > X PjT V Acc Y Z, NPj > X, Tense > T, V pass > V, NPj > y 

5 >y Pj T be Perf V Z (by Acc X) 

where 

Vpass = verb possessing a passive 

There are however some difficulties concerning Z, the structure of which should be spec- 
ified in some way. We do not want to transform 

the cat chased John and me 

into 

*John was chased and me 

or 

I saw the father of my friend 

into 

*the father was seen of my friend 

Apparently it is not easy to formulate the correct restrictions on Z, and so perhaps the 
pedestrian approach of listing all the rules separately is to be preferred. 

Here is another approach to the generation of passives. From such examples as 

I found the door closed 

there is a well motivated need for a rule of the form 

Adj ^Perf VP' 

where VP~ stands for a verb phrase with its direct object (or its first object if it takes 
more than one) deleted. Assuming such a rule then we have productions such as 

S >iVP V cop Adj 

^ Art N Ko P Perf VP' 

*-the book was given to Mary 



5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR 



115 



where since we have a production 

VP *-give book to Mary 

then we have a production 

VP~ ^book to Mary 

If we apply the same production to 

VP *-give Mary a book 

we derive instead the sentence 

Mary was given a book 

This proposal is also most simply formulated as a rule of inference. We leave the 
details to the reader. The point of this proposal is that it does not add to language any 
rule that is not required in any case. It derives the passive as a special case of two rules; 
the one that gives predicative adjective sentences and the one that turns a past participle 
into an adjective. The agent phrase, if any, is derived by the rule that says 

S >S PP 

where PP stand for prepositional phrase. This explains the optional nature of the agent 
clause. 



Commentary by MB 

I am not entirely convinced by this analysis of the passive. First, it is well motivated that 

English has a rule of the form Adj >■ Part since so many participles function as pure adjectives 

(however notice the contrast between, "The door was open" and "The door was opened" ; but 
there are not two words in the case of "closed"). At any rate, it is not necessary to posit a 
passive transformation since a sentence such as the door was closed by him is generated by the 
production 

S >S PP z-NP VP PP >Art N V cop Adj PrepPron 

s-the door was closed by him 

It would be harder, however, to fit the negative and interrogative into the same mold as the 
declarative. And many languages would have to be studied to determine if this kind of analysis 
of the passive is general. It is interesting to note that this passive "transformation" was one 
of the main motivations for transformational grammar, but has now been largely abandoned. 
Another point in favor of the transformational approach is that it gives a ready explanation for 
the semantic relation of the active and passive modes, which my account does not. The jury is 
still out. 



116 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



Exercises 

1. Using (6.5), exhibit the underlying parsing trees of the following sentences: 

the ball may have been taken by her 

she is being hit by the ball 

2. Discuss a modification of (6.5) which will allow two passive auxiliaries 

Upass = be, get 

the latter as in she got hit. 

3. Show that the proposal of the last paragraphs allows even more passives, such as 

the door stayed closed 

the book seemed damaged by the rain 

The last is perhaps marginal, but not obviously bad. 

5.5 Some other transformations 

English abounds with constructions that are usually viewed as grammatical transforma- 
tions. We shall content ourselves with a few examples. However, we shall not insist 
on treating a construction by the methods of transformational grammar if generative 
grammar will do. 

5.5.1 EXAMPLE The negative transformation assigns to any English sentence its 
"grammatical" negation. This should be distinguished from the logical negation, with 
which it usually coincides, but not always. For example, the grammatical negation of 

John must work 

is 

John must not work 

while its logical negation is 

John need not work 

We find it most convenient to formulate the negative transformation as a set of produc- 
tions. Writing 

Neg = negative 

we postulate 



5.5. SOME OTHER TRANSFORMATIONS 



117 



(11) Mod^-Neg Mod 

(12) Neg have Perf ->■ have not Perf 

(13) Neg have Obj ^do not have Obj 

(14) Neg V -^V not, if V — be, must, may, can, shall, wil 

(15) Neg V -^-do not V, if V ^ have, be, must, . . . 

These productions are supplemented by certain optional contraction rules: 

is not ->■ isn't 
was not ^-wasn't 

(16) 

can not -^-cannot, can t 



which also involve transfer of stress. 

Note that Neg have cannot be evaluated until we know whether have is used as an 
auxiliary or as a transitive verb. Admittedly, our rules do not explain 

I haven't the time 

she hasn't any wool 

but it would be difficult to admit these sentences without also admitting 

*the cat hasn't the ball 

For example, in view of (11), we have the parsing tree 

S. 



Pred 




NP 



VP 



T 2 Mod Asp 1 Asp 2 



intr 



Neg Mod 



be Part work 



This yields the pseudo-sentence 

he P 3 T 2 Neg be Part work 



118 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



In view of (14), this becomes 



he P 3 T 2 be not Part work 



By the usual inflection rules, this reduces to 



he was not working 



An optional contraction rule (16) converts this to 



he wasn't working 



5.5.2 EXAMPLE The emphatic transformation is similar to the negative transforma- 
tion. In place of Neg it has Emph and in place of not it has stress, to be placed on the 
preceding syllable. The stress is not shown in written English (but see the last problem 
of this section). 

5.5.3 EXAMPLE The interrogative transformation forms a question from a statement, 
asking if the statement is true. For technical reasons, we begin by negating the statement 
and express the interrogative transformation as a rule of inference: 



Here V x stands for one of the various classes of verbs and Q = question. The latter is 
a sentence type that ought to be distinguished from S, the type of statements; as, for 
instance, one cannot put and between a question and a statement. The rule above refers 
to written English and ignores the rising intonation. 
As an illustration, let us look at the pseudo-sentence 



S *XC ki (V) notF, NPj *X, V x *V 

Q ^C ki (V)XY7 



he P 3 T 2 be not Part work 



that was discussed in Example 5.5.1. Recalling that 



P t T k V >C ki {V) 



we obtain 



S 



he C 2 3(be) not working 




-be 



Q *-C 2 3(be) he working? 



5.5. SOME OTHER TRANSFORMATIONS 



119 



that is, 

Q 5- was he working? 

Our rule explains the question 

do you have the time? 

but not 

have you the time? 

If it is desirable to include the second question as grammatical, the rule could be modified. 
In case one of the rules (16) allows a contraction 

C ki {V) not >C w (y)n't 

our rule should be supplemented by the following 

S ^KQjyytY, NPj — >x, v x — 

Q >C ki (V)n'tXYl 

This yields, for instance, 

Q ^wasn't he working? 

5.5.4 Example Wh-question. Given a sentence containing he (or she) one may form a 
question by changing the pronoun to who and moving it to the beginning of the sentence, 
unless it was there already. However, if who is moved, the interrogative transformation of 
Example 5.5.3 must be applied. For example, from (the underlying parsing tree of) 

he works 

one obtains 

who works? 

and from 

this is he 

one obtains 

who is this? 

and from 

you say he drinks 

one obtains 

who do you say drinks? 



120 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



We may try to summarize these facts with the help of two rules of inference: 

S > he P 3 X, Pred >X 

Q -who P 3 XI 



(Wh-1) 



(Wh-2) 



S -X QJV) not Y he Z, NF\ -X, V x 

Q -who C ki {V) XY Z 



(As in Example 5.5.3, it is convenient to start with the negative.) 

There are some difficulties with (Wh-2), unless the structures of Y and Z are specified. 
For example, (Wh-2) would allow us to infer from the negative sentence 

I do not know what he will say 

X C U (Y) Y Z 

the question 

*who do I know what will say? 

and from the negative sentence 

this is not John and I 

the question 

*who is this John and? 

Supposing these difficulties can be overcome (and various suggestions have been made 
for doing so), one may treat in a similar way questions starting with whom, what, where, 
etc., which are obtained by replacement of him (or her), it, here, etc., respectively. 



Exercises 

1. Exhibit the underlying parsing trees of the following sentences: 

I cannot call you 

she does not work 

he has not been resting 

you were not called by me 

she is not clever 



5.5. SOME OTHER TRANSFORMATIONS 



121 



2. Show how the interrogative transformation may be used to obtain the following 
questions: 

can I call you? 
does she work? 
has he been resting? 
does he have the ball? 
were you called by me? 
is she clever? 

3. Examine the possibility of formulating the rules of emphasis and contraction in such 
a way that the contraction rule obligatory — unless its operation is blocked by the not 
being emphasized. There is evidence in this suggestion in that whenever contraction is 
not applied, the not is emphasized in normal speech. 



CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS 



References 



Y. Bar-Hillel, M. Perles and E. Shamir, On formal properties of simple phrase structure 
grammars. Z. Phonetik Sprachwiss, Kommunikat. 14 (1961), 143-172. Reprinted in 
Luce et al, Vol. 3. 

L. S. Bobrow and M. A. Arbib, Discrete Mathematics. Saunders, Philadelphia, London, 
Toronto, 1974. 

G. S. Boolos and R. C. Jeffrey, Computability and Logic. Cambridge University Press, 
London, New York, 1974. 

S. Brainerd, Introduction to the Mathematics of Language Study. American Elsevier, 
New York, 1971. 

N. Chomsky, Syntactic Structures. Mouton, 'S-Gravenhage, 1957. 

N. Chomsky, On the formal properties of grammars. Information and control, 2 (1959), 
137-167. 

N. Chomsky, Aspects of the Theory of Syntax. M.I.T. Press, Cambridge, MA, 1964. 

N. Chomsky, Language and Mind. Harcourt, Brace, Jovanovich, New York, San Francisco, 
Atlanta, 1972. 

P. M. Cohn, Algebra and language theory. Bull. London Math. Soc. 7 (1975), 1-29. 

M. Davis, Computability and Unsolvability. McGraw-Hill, New York, Toronto, London, 
1958. 

S. Eilenberg, Automata, Languages and Machines, Vol. A. Academic Press, New York, 
1974. 

E. Engeler, Introduction to the Theory of Computation. Academic Press, New York, 
London, 1973. 

R. Fowler, An Introduction to Transformational Syntax. Routledge and Kegan Paul, 
London, 1971. 



123 



124 



REFERENCES 



S. Ginsburg, The mathematical Theory of Context-Free Languages. McGraw-Hill, New 
York, 1966. 

A Ginzburg, Algebraic Theory of Automata. Academic Press, New York, London, 1968. 

M. Gross, Mathematical Models in Linguistics. Prentice Hall, Englewood Cliffs, NJ, 1972. 

M. Gross and A. Lentin, Introduction to Formal Grammars. Springer- Verlag, Berlin, 
Heidelberg, New York, 1970. 

Z. Harris, Mathematical Structures of Language. Interscience, Wiley, New York, London, 
Sydney, Toronto, 1968. 

J. E. Hopcroft and J. D. Ullman, Formal Languages and their Relation to Automata. 
Addison- Wesley, Reading, MA, 1969. 

S. C. Kleene, Introduction to Metamathematics. Van Nostrand, New York, 1952 

S. C. Kleene, Representations of events in nerve nets. In Shannon, 3-40. 

R. R. Korfhage, Logic and Algorithms. Wiley, New York, London, Sydney, 1966. 

J. Lambek, How to program an infinite abacus? Canad. Math. Bull. 4 (1961), 295-302 

J. Lambek, The mathematics of Sentence structure. Amer. Math. Monthly, 65 (1958), 
154-170. 

J. Lambek, Contributions to the mathematical analysis of the English verb phrase. J. 
Canad. Linguistic Assoc. 5 (1959), 83-89. 

J. Lambek, On the calculus of syntactic types. Proc. Symposium Appl. Math. 12 (1961), 
166-178, Amer. Math. Soc, Providence, RI. 

J. Lambek, A mathematician looks at French conjugation. Theoretical Linguistics, 2 
(1975), 203-214 

D. Luce, E. Bush and E. Galanter, eds., Handbook of mathematical psychology, 3 Vols. 
Wiley, New York, London, 1963, 1965. 

J. Lyons, Introduction to Theoretical Linguistics. Cambridge University Press, London, 
New York, 1968. 

A. I. Mal'cev, Algorithms and recursive functions. Wolters-Noordhoff, Groningen, 1970. 

S. Marcus, Algebraic Linguistics, Analytic Models. Academic Press, New York, London, 
1967. 

Z. A. Melzak, An informal introduction to computability and computation. Canad. Math. 
Bull. 4 (1961), 279-293. 



125 



F. Palmer, Grammar. Penguin Books, Harmondsworth, Middlesex, 1971. 

P. C. Rosenbloom, The Elements of Mathematical Logic. Dover Publications, New York, 
1950. 

D. Scott, Some definitional suggestions for automata theory. J. Computer Systems Sci- 
ences 1 (1967), 187-212. 

C. E. Shannon and J. McCarthy, eds., Automata Studies. Princeton University Press, 
Princeton NJ, 1956. 

A. M. Turing, On computable numbers with an application to the Entscheidungsproblem. 
J. London Math. Soc. (Series 2), 42 (1933-37), 230-265, 43 (1937), 544-546. 

V. H. Yngve, The depth hypothesis. Amer. Math. Soc, Proc. Symposium in Appl. 
Math. 12 (1961), 130-138, Amer. Math. Soc, Providence, RI. 



126 REFERENCES 



Appendix: A mathematician looks at 
the French verb 



J. Lambek 

6 Introduction 

The penalty one has to pay for the enjoyment of French culture is that one must face the 
problem of conjugating the French verb. The following considerations are offered to the 
mathematically included reader in the hope that they may help him to bypass some of 
the struggles the author went through in trying to cope with this problem. However, it 
is only fair to point out that the author still does not manage use the "simple" past and 
the two subjunctives without the help of his pocket calculator. 

The problem is briefly this: even forgetting about imperatives and participles, one 
finds associated with French verb V an array of 42 forms C ik (V), where % ranges from 1 
to 7 and k from 1 to 6. Here i denotes the "simple tenses": present, imperfect, future, 
conditional, present subjunctive, simple past and past subjunctive. No purpose is served 
for the moment in subdividing these into two "moods": indicative and subjunctive. The 
subscript k represents the six "persons" associated with the pronouns: je, tu, il, (or elle), 
nous, vous, ils (or elles). Again, we have chosen not to partition these persons into two 
"numbers" : singular and plural. 

How is the matrix C(V) constructed? It would be too bad if one had to learn 42 forms 
for each verb V. In fact, a superficial investigation shows that the matrix C(V) depends 
only on 5 parameters, which may be restricted to 3 on closer examination. Moreover, 
for most verbs in the dictionary and for all newly formed verbs, these parameters are 
predictable from the ending of the infinite. It turns out that the conjugation matrix 
C{V) may be obtained by rewriting rules from a product A(V)B where A(V) is a 7 by 
3 matrix depending on the verb V, while B is a 3 by 6 matrix not depending on the 
verb. It so happens that B has only 6 nonempty entries, one in each column. Therefore, 
multiplication of the matrices A(V) and B is possible, even though French is only a 
semigroup and not a ring. 



127 



128 



APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB 



To make this article accessible to as many readers as possible, the present investigation 
will be concerned with written french. However, one concession has been made to spoken 
French: stressed vowels or diphthongs have been underlined, as an understanding of stress 
makes it easier to see what is going on. It has been assumed that each verb form has 
exactly one stressed vowel or diphthong. Regrettably it has not been found convenient to 
distinguish between aj pronounced e and aj pronounced e. 

A very much simpler treatment could have been achieved by ignoring 10 troublesome 
"exceptional" verbs: 

aller, avojr, fallojr, pouvojr, savojr 

valoilr, vouloir, etre, fajre, dire 
However, these verbs occur so frequently that it was decided to include them in our 
considerations. 

7 The present tense 

Surprisingly, about half the effort in leaning French conjugation is required for mastering 
the present tense. We observe that each verb form in the present tense is made up of 
two components: stem plus ending. We write x for the stem and provisionally let the 
number % — 1, 2, . . . , 6 stand for the ending. The actual nature of the ending depends on 
the grapheme preceding it. 

How the actual ending is obtained is shown by the following six rewriting rules: 



(1) 



XI 



< 



' x\ 

X 
Xx 
^Xs 



if X ends in a 

if X ends in e 

if X ends in au or eu 

otherwise 



Examples: j'aj, j'ajme, je peux, je finis. 





Examples: tu es, tu vaux, tu aimes. 



(3) 



X3 > 



X if X ends in a, e, c,d, or t 
Xt otherwise 



Examples: il a, il ajme, il vajnc, il vend, il bat, il finjs. 



(4) 



J Xmes if X is stressed 
\ Xqns if X is unstressed 



7. THE PRESENT TENSE 



129 



Examples: nous sommes, nous aimons. 

(5) 

Examples: nous fajtes, nous aimez 

(6) XQ * 



f Xtes if X is stressed 

if X is unstressed 



( Xnt if X ends in e or o 
I Xent otherwise 

Examples: ils ajment, ils ont, ils finjssent. 

We hasten to point out that these same rewriting rules will be used in other tenses 
later. For instance, while fajtes might be discounted as an exception, this method of 
forming the fifth person ending will become the rule in the simple past. 

The partition into stem plus ending often allows some leeway and may be based on 
an arbitrary decision. For example, we have analyzed ajment as ajme+nt rather than 
ajm+ent, but we could equally well have adopted the second alternative. Rule (6) yields 
ajment in either case. 

In general, there are three present stems. Consider, for example, the present stem of 
devojr: je dojs, tu dojs, il dojt, nous devons, vous devez, ils dojvent. To all appearances, the 
first three persons are derived from doj, the fourth and fifth persons from the stem dev 
and the sixth person from the stem dojv. This is the general situation, although in many 
cases two of these stems coincide and some exceptional verbs such as etre appear to have 
more than three stems in the present tense. 

We shall denote the three present stems of the verb V as follows: 

a(V), P(V), j{V) 

where the underlining again indicates stress. The full present tense is then normally 
obtained by the rewriting rules (1) to (6): 

a(V)l, a(V)2, a(V)3, (3{V)A, 0(V)5, X (V)6 

For example, 

a(devoir) > dgi 

and 

a(devojr)l ^ dojl > dojs 

The present stems of many French verbs may be read off from Tables I and II which 
will be found in Sections 10 and 11, respectively. These tables also contain some other 
data that will be discussed later. 



130 



APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB 



8 The other tenses 

Where the present tense depends on the stems 

a(V), P(V), j{V) 
the imperfect tense uses instead the stems 

/Wai, P(V)\, P(V)a\ 
In other words, the full imperfect tense is obtained by rules (1) to (6) from 
P(V)a\l, P(V)a\2, P(V)a\3, (3(V)\4, (3(V)\5, (3(V)a\6, 
Thus the imperfect reads: 

(3(V)a[s, (3(V)a\s, (3(V)a[t, [3(V)\ons, (3(V)\ez, [3(V)a\ent 
The future and the conditional have the stems 

(f>(V)ra, 4>{V)r <t>{V)ro 

and 

0(y)raj, 4>{V)r\ 4>{V)m 

respectively. Here <f>(V) is a new morpheme, which is closely related to the infinitive V. 
The reader will observe that rule (1) yields 

0(V)ral ^(p(V)ra\ 

in the future, but 

0(F)rail ^0(K)rais 

for the conditional. 

The present subjunctive makes use of the stems 

i{V)e, (3'(V)t, i{V)e 

Here 7'(V) = 7(V) and f3'(V) = f3(V) for all but nine of the ten exceptional verbs. More- 
over, the Greek letters e and i are subject to the following rewriting rules: 

/x v \ Xy if X ends in a or o 

U) Xi >- 

Iai otherwise 

Examples: que nous ayons, que nous soyons, que vous aimiez, que nous voyions 

X if X ends in e 

(e) Xe > \ X\ if X ends in a or o 

Xe otherwise 



9. FACTORIZING THE CONJUGATION MATRIX 



131 



Examples: que tu ajmes, qu'il ajt, qu'il sort, que je meure 

The simple past has the stems 

±(V), ±(Vy, t(V)r 

where ip(V) is another morpheme, one that is rapidly disappearing in spoken French. 
Moreover the circumflex is subject to the following rewrite rule 

{X if the stressed vowel of X has a circumflex 
X with the circumflex placed on the stressed 
vowel otherwise 

Examples: nous aimames, vous vmtes, nous crumes (from croTtre as well as croTre). 
Finally, the past subjunctive has six persons 

^(V)sse, ^(V)sses, ip(vyt, ?/>(V)ssions, tl)(V)ss\ez, -^(K)ssent 

where ip(V) is the same as ip(V), only without stress. With a little bit of cheating, we 
may reduce these six persons to three stems as follows: 

±(V)[sse], ^(V)ss\, ±(V)sse 

provided [sse] is subject to the following rewriting rules: 

[sse]A; 

Note that we rewrite 

aima[sse]3 *-aima"3 *-aima3 *-aimat 

Had there been no circumflex on the a, rule (3) would have eliminated the final t. 



"k if k = 3 
ssek if k = 1 or 2 



9 Factorizing the conjugation matrix 

Summarizing our considerations up to this point, we find that the conjugation matrix 
C{V) may be obtained by various rewriting rules from the following matrix C'(V): 



( a{V)l 


a(V)2 


a(V)3 


(3(V)4 


P(V)5 


l(V)6 \ 


/3(F)ajl 


/9(V)a[2 


(3(V)a\3 


0(V)\4 


P(V)\5 


p(V)a\6 


<j){V)ral 


4>{V) ra2 


0(V)ra3 


0(V)r4 


<f>(V)r5 


0(y)ro6 


<^(F)rail 


0(V)rai2 


4>(V)ra\3 


0(y)ri4 


4>(V)r\5 


0(y)ra]6 


i{V)el 


i{V)e2 


l'(V)e3 


(3'(V)iA 


(3'(V)l5 


Y(y)e6 


Wi 




^{V)3 


il>(V) f4 


^(V) 5 


*/>(V>6 


\V(^)[sse]l 


V'(y)[sse]2 


^(F)[sse]3 




ip(V)ss\5 


tp(V)sse6/ 



132 



APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB 



Thus we have C' ik (V) >C ik (V) 

To a mathematician it is apparent that the matrix C'(V) may be factored as follows: 



/ ol(V) 


m 


l{V) \ 




P(V)a\ 


P(V)\ 


(3{V)*\ 




(j)(V)ra 


<f>(V)r 


(p(V)ro 


/l 2 3 \ 


4>(V)ra\ 


<j>(V)ri 


0(V)rai 


4 5 


i{V)e 


(5'{V)t 


i{V)e 




i>(V) 


±{vy 


ip{V)r 




\4>(V)[sse] 


^(V)ssi 


vjj(V)sse) 





We note that, even though we are in a semigroup and addition has not been defined, 
it is possible to multiply two matrices if, as in the present case, each row of the first meets 
each column of the second in at most one non-empty place. 

Whether this factorization is indicative of the internal logic of the French language, 
or whether it is a mere fluke, as linguists would tend to say, there is no harm if the 
mathematically inclined reader makes use of it as an aid to memory. 

Let us write 

P k = kth person 
Tj = ith tense 

then the importance of the matrix C'(V) derives from the 42 rewriting rules 

P kTi V^C' ik (V) 

The significance of these rules will only become apparent when one wishes to construct 
sentences as in Section 7 below. 

A more radical factorization of the matrix C'(V) into a column vector and a row vector 
is possible. The column vector is the middle column of the above 7 by 3 matrix, and the 
row vector has the form 

(SI £2 S3 4 5 E'6) 

where S and £' are certain "stress operators" . But this would necessitate introducing a 
whole new lot of new rewriting rules, and we shall desist from such a project here. 

10 Radicals of ordinary verbs 

We observe that the matrix C'(V) discussed in Section 4 depends on 5 parameters called 
"radicals" : 

a(V), 0(V), 7(V), 0(VO, t(V) 

We are not counting ip(V) as different from ifi(V) only by the absence of stress. We 
are also not counting (3'(V) and "f'(V), which differ from f3{V) and 7(V) only for nine 
exceptional verbs. 



10. RADICALS OF ORDINARY VERBS 



133 



A reduction in the number of parameters to 3 is possible in view of the following 
observations: 

(i) With very few exceptions, (f>(V) is obtained from the infinitive V by removing the 
final r or re. 

(ii) With very few exceptions, 7(V) has the same consonants as (3(V) and the same 
stress and vowels as a(V). 

In this connection, one may also observe that very often a(V) can be obtained from 
<j>(V) by adding stress. 

We shall refrain from giving a precise formulation of these two observations, so we 
are left with 5 radicals. The five radicals of a number of ordinary verbs are listed in 
Table I below. It is understood that the verbs listed serve as models for many verbs not 
listed. In particular, this is so for aimer and finir, which represent the so called "living 
conjugations". Table I does not include the 10 exceptional verbs. 



Table I 



V 


a(V) 


POO 

= F{v) 


= i\v) 


<f>(V) 


t(V) 


aimer 


ajme 


aim 


ajme 


aime 


aima* 


appeler 


appelle 


appel 


appelle 


appelle 


appela* 


manger 


mange 


mang* 


mange 


mange 


mang*a* 


placer 


place 


plac* 


place 


place 


plac*a* 


finir 


fini 


finiss 


finiss 


fini 


fini 


haVr 


haj 


haiss 


haiss 


ha'i 


ha'i 


mourir 


meur 


mour 


meur 


mour 


mouru 


sortjr 


sor 


sort 


sort 


sorti 


sorti 


venjr 


vien 


ven 


vienn 


viend 


vin 


devoir 


doj 


dev 


dojv 


dev 


du 


voir 


voj 


voy 


voj 


ver 


vi 


crojtre 


croj 


croiss 


croiss 


croTt 


cm 


najtre 


naj* 


naiss 


najss 


naTt 


naqui 


prendre 


prend 


pren 


prenn 


prend 


pri 


vend re 


vend 


vend 


vend 


vend 


vendi 


vajn ere 


vajnc 


vainqu 


vajnqu 


vainc 


vainqui 


croire 


croj 


croy 


croj 


croi 


cru 


ecrire 


ecn 


ecriv 


ecriv 


ecri 


ecrivi 


Nre 


N 


lis 


Ns 


li 


lu 


plajre 


pla]* 


plais 


plais 


plai 


plu 



134 



APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB 



Table I should be thought of as a condensed list of rewriting rules. For example, the 
first entry represents the rule 

a(aimer) ^aime 

The starred letters in Table I signal some minor irregularities, mainly of an ortho- 
graphic nature: 

a* becomes e before r 
g* becomes ge before a or o 
c* becomes 9 before a or o 
i* becomes T before t 

Examples: ils aimerent, nous mangeons, je placais, il plajt. 

Of course, these remarks can be phrased as rewriting rules. For example, the first one 
becomes 

^ y ( hY if Y starts with r 
\ aY otherwise 

It should be pointed out that, when applying rules (1) to (6), the stem X may contain 
a star. Thus we have 

,5(manger)4 > mang*4 > mang*ons >■ mangeons 

11 Radicals of exceptional verbs 

The radicals of the 10 exceptional verbs are listed in Table II, with separate listing for 
P'(V) and 7'(V). Note that falloir is incomplete, as it only possesses forms in the third 
person. 

TABLE II 



V 


a(V) 


P(V) 




l(V) 


i(Y) 


<KV) 


±(V) 


alley 


[va] 


all 


all 


vo 


qill 


i 


alia* 


avoir 


a 


av 


a 





[ai] 


au 


eu 


falloir 


fau 


fall 






faill 


faud 


fallu 


pouvoir 


pen 


pouv 


puiss 


peuv 


puiss 


pour 


pu 


savoir 


sad 


sav 


sack 


sav 


sack 


sau 


su 


valoir 


van 


vol 


val 


val 


vaill 


vaud 


valu 


vouloir 


veu 


voul 


voul 


veul 


veuill 


voud 


voulu 


etre 


[es] 


let] 


so 


so 


so 


se 


fu 


faire 


fai 


[fais] 


fass 


So 


fass 


fe 


fi 


dire 


di 


[dis] 


dis 


dis 


dis 


di 


di 



We may regard f3'(V) and 7'(V) as alternative forms (allomorphs) of (3(V) and j(V) 
respectively. Other morphemes may possess alternative forms too, and when this is the 



12. A PRODUCTION GRAMMAR FOR A FRAGMENT OF FRENCH 



135 



case we have places them in square brackets. The square brackets may be removed 
according to the following instructions, with a change of form in the specified context. 
The understanding is that, in contexts not mentioned explicitly, the square brackets may 
be removed without change. 

[va] becomes va| before 1 
[es] becomes su] before 1 
[et] becomes som before 4 
[et] becomes e before 5 
[fa is] becomes fa| before 5 
[dis ] becomes dj before 5 
[ai] becomes a before e3 

All together there are 7 exceptional forms: je vais, je su|s, nous sommes, vous etes, vous 
faites, nous dites, qu'il art. 

We may of course rephrase the above instructions for removing square brackets as 
rewriting rules. For example, the first one becomes: 

vajk if k — 1 
vak otherwise 



[va] k ■ 



It should be pointed out that we are not allowed to apply rules (1) to (6) to an 
expression containing square brackets, and the same goes for rule (e). Thus 

a(aller) >[i>a]l / > [va ] 1 

>■ vai 1 vais 

Similarly 

7'(avoir)e3 > [aj]e3 / > [ai] 

but *-ae3 *-ai3 *-ait 



12 A production grammar for a fragment of French 

Our investigation has been conducted in the language of "production grammars". Ac- 
tually, we are considering the free semigroups generated by an alphabet made up of two 
parts: 

(a) the terminal alphabet consisting of French graphemes in sans serif, sometimes with 
underlining to indicate stress 

(b) an auxiliary alphabet consisting of grammatical terms, in the present case the fol- 
lowing symbols: 

l,...,6,a,...,^,e, [,],*, P 1 ,...,P 6 ,T 1 ,...,T 7 



136 



APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB 



The free semigroup is equipped with a binary relation > which satisfies the usual 

reflexive, transitive and substitution laws, as well as all the rewriting rules that have been 
introduced in Sections 2 to 6, specifically the following rules: 

(i) for 1 to 6 in Section 2, 

(ii) for e, i, " and [sse] in Section 3, 

(iii) for P k Tj in Section 4, 

(iv) for a to ip in Table I and Table II, 

(v) for the star in Section 5 

(vi) for the square brackets in Section 6. 

What one is really interested in is the formation of sentences. For expository purposes, 
we shall here confine attention to a small part of the French language which contains a 
few declarative sentences made up out of personal pronouns followed by inflected forms 
of intransitive verbs and also of the corresponding subjunctive clauses. 

We need some additional symbols: 



S = sentence 

Si = subjunctive clause in ith tense (i = 5, 7) 
Subj k = subject in kth person 
Predi = predicate in ith tense 
Vintr = intransitive verb 



We also postulate the following new rewriting rules 



S -»- Subj k Predi, (i = 1, 2, 3, 4, 6) 
Si ^-que* Subj k Predi (i = 5,7) 
Subj 1 — je* P 1 

Sub] 2 -s~tU P2 

Subj^W P 3 , elle P 3 , on P 3 
Subj 4 ->■ nous P 4 
Subj 5 ->■ vous P 5 
Subj 6 ->■ ils P 6 , elles P 6 
Vintr -*-aller, sortjr, venjr, . . . 
Predi ^Ti V intT 



as well as the following: 




eX if X starts with a consonant 
' X if X starts with a vowel 



12. A PRODUCTION GRAMMAR FOR A FRAGMENT OF FRENCH 



137 



Unless otherwise specified, i ranges from 1 to 7 and k from 1 to 6. 

A few examples will make it clear how declarative sentences and subjunctive clauses 
are generated by the above production grammar. 



(1) S^Subj 1 Pred x 

-je* P x T-y V intI 
-^je* Pi Ti aller 
-^je* a(aller)l 
-*-je* [vajl 
-s-je* vail 
-^-je* vais 

(3) S ->■ Subj 4 Pred 3 

->■ nous P 4 T 3 V intI 
nous P4 T 3 mour|r 
nous 0(mounr)r4 
nous mourr4 
-> nous mourrons 

(5) S ^ Subj 6 Pred 6 
— ils P- 6 T 6 \/ intr 

i Is P 6 T 6 venir 
-> ils ?/>(ven|r)r6 

ils vmr6 
-> ils vinrent 



(2) S 



■ Subj 5 Pred$ 
-vous P 5 T 2 V intT 
-vous P 5 T 2 sortjr 
-vous /3(sort|r)i5 
vous sorti5 
■vous sortiez 



(4) 



S 5 -^ que* Subj 2 Pred 5 
^que* tu P 2 T 5 \/ intr 
-^-que tu P 2 T 5 aller 
-^-que tu 7'(aller)e2 
^-que tu ajlle2 
-^-que tu a|lle2 
-^que tu ajlles 

(6) S 7 -> que* Sub j 3 Pred 7 

— que* il P 3 T 7 Kntr 

^ q u ' i I P 3 T 7 venir 

— ^ qu ' il ?/>(venir)"3 

— s~ q u ' i I vin"3 

^- q u ' i I vm 



REFERENCES 

1. N. Chomsky, Syntactic structures. Mouton, 'S Gravenhage, 1957. 

2. J. Dubois and R. Lagane, La nouvelle grammaire du Frangais. Larousse, Paris, 1973. 

3. M. Gross, Mathematical Models in Linguistics. Prentice Hall, Englewood Cliffs, NJ, 
1972. 



4. A. Simond, Les verbes frangais. Payot, Lausanne, 1973.